In sample surveys on sensitive characteristics, it is natural for the respondents to refuse to provide information. The sensitive characteristics under study may be illegal income, monthly expenditure, the number of cigarettes used per day, the marks obtained in an examination, and the amount of tax payable etc. Such refusals result in a high rate of non-response in the collected data which may badly affect the estimates of population parameters. In order to cope with refusals on sensitive variables, Warner (1965) proposed a strategy commonly called the randomized response technique. Warner’s (1965) randomization technique was limited to binary variables. Warner (1971) introduced another technique for situations where the sensitive variable of interest is quantitative. Eichhorn and Hayre (1983) suggested a quantitative randomized response model where multiplicative scrambling is used as opposed to the additive scrambling model of Warner (1971).
The concept of optional randomized response techniques was first studied by Gupta et al. (2002). In all of the existing versions of optional randomized response models, the respondents are free to either report the true response or report a scrambled response. Another optional randomized response technique was introduced by Bar-Lev et al. (2004) where a multiplicative scrambling noise is utilized as opposed to the additive scrambling in the Gupta et al. (2002) technique. Yan et al. (2008) introduced a measure for the respondent-privacy level ensured by a quantitative randomized response model. Diana and Perri (2011) introduced a randomized response procedure which utilizes both additive and multiplicative scrambling. Hussain et al. (2016) introduced a randomized response strategy which uses additive and subtractive scrambling. Gupta et al. (2018) presented a joint measure of privacy protection and efficiency for assessing the overall quality of quantitative randomized response models. Narjis and Shabbir (2021) proposed a modified variant of the Gjestvang and Singh (2009) model. Khalil et al. (2021) analyzed the influence of measurement errors on the estimators of the mean in sensitive surveys. Gupta et al. (2022) introduced a scrambled randomized response procedure which improved the Diana and Perri (2011) technique in terms of efficiency and privacy protection. Further research studies on randomized response models can be found in Kalucha et al. (2016), Murtaza et al. (2021), Yan et al. (2008), Young et al. (2019), and Zhang et al. (2021).
Besides simple random sampling, the ranked set sampling scheme can also be combined with randomized response technique to obtain efficient estimates of the parameters of interest. For detailed literature, one may refer to the studies of Mahdizadeh and Zamanzade (2021a, 2021b) and Mahdizadeh and Zamanzade (in press, 2022a, 2022b).
The next section presents some of the existing quantitative randomized response models.
Some Existing Quantitative Models and Evaluation Metrics
Let the population under consideration consists of N units and a simple random sample of n units is obtained with replacement. Further, let Y denote the sensitive variable of interest and S denote an additive scrambling variable and let us assume that , , , . Moreover, let T be a multiplicative scrambling variable such that , and , where and are population variances of variable Y, T, and S, respectively, and is the mean of the sensitive variable Y. It is further assumed that all variables are independent of each other. In this section, some existing quantitative scrambling techniques are presented.
The Warner (1971) Additive Model
The reported responses under the Warner (1971) additive scrambling model are as follows:
1
An unbiased mean estimator of Y based on the Warner (1971) model is given as:
2
The variance of is given as:
3
The Eichhorn and Hayre (1983) Model
The reported responses under the Eichhorn and Hayre (1983) technique are as follows:
4
An unbiased mean estimator of Y under the Eichhorn and Hayre (1983) technique is as follows:
5
The variance of is given as:
6
The Diana and Perri (2011) Quantitative Model
The reported responses under the Diana and Perri (2011) quantitative scrambling model are given as:
7
An unbiased mean estimator of the sensitive variable of interest on the basis of the Diana and Perri (2011) technique is given as:
8
The variance of is given by:
9
The measure of privacy level due to Yan et al. (2008) for comparison of randomized response models is as follows:
10
The higher the value of , the higher the level of privacy of the respondents provided by a particular randomized response model.
The joint measure of Gupta et al. (2018) for privacy and efficiency is as follows:
11
From Equation 11, one can clearly observe that a lower value of is preferable.
For the Warner’s (1971) model, the measure of respondent-privacy is as follows:
12
The joint measure of efficiency and privacy for the Warner’s (1971) model is given as:
13
For the Eichhorn and Hayre (1983) quantitative technique, the measure of privacy is given by:
or
14
The joint measure of model-efficiency and respondent-privacy for the Eichhorn and Hayre (1983) quantitative technique is given as:
15
The measure of privacy for the Diana and Perri (2011) model is given by:
16
The joint measure of privacy and efficiency for the Diana and Perri (2011) model is given as:
17
In each of the proposed models, since the respondents in the first group give true response, so the measure of privacy is zero. In the second group, the responses provided by the respondents are the same as those of the corresponding existing models. The only difference is that the sample size is used in place of n. Since the mathematical expression for in the case of each model is independent of the sample size n, so the value of for each proposed model is the same as that of the corresponding existing model. That is, for the proposed Model I, the measure of privacy is given by:
18
For the proposed Model II, the measure of privacy is given by:
19
For the proposed Model III, the measure of privacy is given by:
20
The joint measure of efficiency and privacy for the proposed Model I is given as:
21
The joint measure of efficiency and privacy for the proposed Model II is given as:
22
The joint measure of efficiency and privacy for the proposed Model III is given as:
23
Proposed Models
In the proposed technique, the researcher asks each respondent whether he/she wants to report the correct answer or prefers to report a scrambled response. The researcher not only collects response on the sensitive variable under study but also records whether it is true response or scrambled response. At the end of data collection process, the researcher knows how many of the collected responses are scrambled. This procedure enables the researcher to know the priority of the respondents about true or scrambled responses. Let out of respondents disclose to the researcher that they are providing the true response without using the scrambling technique, and let the remaining respondents prefer the scrambling technique for privacy protection. This section presents the modified versions of the models given in Section 2.
Proposed Model I
Motivated by Warner (1971) and Gupta et al. (2002), every participant is asked to either report the true response or use a scrambling procedure. Every respondent also has to tell the researcher whether his/her response is a true or scrambled response. This enables the researcher to know the exact number of respondents who opted for true response, and the number of respondents who opted for scrambled response. Under the proposed Model I, there are two groups of respondents:
-
The respondents who report the true response Y.
-
The respondents who report the scrambled response .
The mean of the first group is:
24
The mean of the second group is:
25
The mean estimator of the sensitive variable under study is the weighted mean of the two groups. That is;
26
where .
Proposed Model II
Motivated by Eichhorn and Hayre (1983), every respondent is requested to either report the true response or use a multiplicative scrambling. Every respondent also has to tell the researcher whether his/her response is true or scrambled. This enables the researcher to know the exact number of respondents who opted for true response, and the number of respondents who opted for scrambled response. Under the proposed Model II, there are two groups of respondents:
-
The respondents who report the true response Y.
-
The respondents who report the scrambled response .
The mean of the first group is:
27
The mean of the second group is:
28
The mean estimator of the sensitive variable under study is the weighted mean of the two groups. That is;
29
Proposed Model III
Motivated by Diana and Perri (2011), every respondent is requested to either report the true response or use a scrambling procedure. Every respondent also has to tell the researcher whether his/her response is true or scrambled. This enables the researcher to know the exact number of respondents who opted for true response, and the number of respondents who opted for scrambled response. Under the proposed Model III, there are two groups of respondents:
-
The respondents who report the true response Y.
-
The respondents who report the scrambled response .
The mean of the first group is:
30
The mean of the second group is:
31
The mean estimator of the sensitive variable under study is the weighted mean of the two groups. That is;
32
Mean and Variance
The section presents the proof of unbiasedness and derivation of variances of the mean estimators under the proposed models.
Theorem 1: The estimators , and are unbiased estimators of the population mean .
Proof: Taking expectation on both sides of Equation 26 yields:
33
Taking expectation of Equations 24 and 25 yields:
34
and
35
Using Equations 24 and 35 in 33 yields:
36
In a similar manner, the unbiasedness of and can be easily proved.
Theorem 2: The variances of the estimators , and are given by:
37
38
39
Proof: Applying variance on both sides of Equation 26 yields:
40
Applying variance on both sides of Equation 24 and 25 yields:
41
and
42
Using Equation 41 and 42 in Equation 40 and simplification yields:
or
Using the same procedure and assuming independence of variables, the variances of and can be easily obtained.
An Application of the Proposed Technique
The proposed Model III was applied to the problem of estimation of the true mean of the Grade Point Average (GPA) of the 175 students of the Department of Statistics, University of Malakand, Pakistan. A simple random sample of 40 students was obtained from the undergraduate students currently enrolled in the department. Currently, a total of 175 students are studying in the undergraduate program of the Department of Statistics in the University of Malakand, Pakistan. Each of the 40 selected students was asked whether he/she wants to report the true GPA. If the student’s answer was ‘yes’, he/she reported his/her true GPA. If a respondent did not want to report his/her true GPA, he/she was given a deck of 100 cards along with a calculator. Each card had two random numbers printed on it—one for variable T and the other for variable S. The random numbers for both scrambling variables were generated using a normal distribution. The random numbers for the additive scrambling variable S were generated using a normal distribution having mean 0 and variance 0.5. The random numbers for the multiplicative scrambling variable T were generated using a normal distribution having mean 1 and variance 0.5. The respondents who opted for scrambled response were told not to disclose their true GPA to the interviewer, and hence their privacy protection was ensured. The respondents were also told not to show the selected card to the interviewer. Out of 40 students, 16 students wanted to report the true GPA, whereas the remaining 24 students opted for scrambled response. The responses reported by the 40 sampled students are presented in Table 1.
Table 1
Responses Reported by Students
True Responses
|
Scrambled Responses
|
||||||||
---|---|---|---|---|---|---|---|---|---|
2.78 | 3.41 | 2.88 | 3.16 | 2.9677 | 4.3116 | 2.7810 | 3.3618 | 3.5319 | 2.4298 |
3.75 | 2.47 | 1.99 | 3.33 | 1.5986 | 2.9468 | 2.6090 | 3.7874 | 4.0074 | 1.9924 |
3.90 | 3.64 | 2.43 | 1.88 | 3.8477 | 1.8653 | 2.9668 | 4.4793 | 1.3270 | 4.6992 |
2.58 | 3.16 | 2.24 | 1.98 | 2.7437 | 3.3362 | 1.6973 | 3.4518 | 3.1946 | 2.6173 |
In Table 1, one may observe that some of the reported scrambled responses exceed 4.0 although the students’ actual GPA was on the scale of 4.0. If the researcher generates random numbers from normal distribution having a large mean or variance, then the reported scrambled responses may result in large values which will look unnatural for students’ GPA dataset. Moreover, it may also lead to overestimate the true mean GPA since the estimates are calculated from the observed responses. It is therefore advised that the researchers should keep in mind to always choose appropriate choices of the parameters of the distribution from which random numbers are to be generated. The parameters should be chosen in such a way that the reported scrambled responses do not deviate too much from the possible range of the quantitative variable of interest. In the given example, one may observe that most of the scrambled responses cover the possible range of the GPA which is from 0 to 4.
Efficiency Comparison
The suggested Model I is more efficient than Warner’s (1971) model if:
or
or
43
Condition 43 always holds.
The suggested Model II is more efficient than the Eichhorn and Hayre (1983) model if:
or
or
44
Condition 44 always holds.
The suggested Model III is more efficient than the Diana and Perri (2011) model if:
or
or
45
Condition 45 always holds.
Table 2 displays the variances of the mean estimator under the Warner (1971) and the Eichhorn and Hayre (1983) scrambling model, the Diana and Perri (2011) quantitative model, and the three proposed models for various choices of and . One may clearly observe the improvement in efficiency of the proposed models over the existing models.
Table 2
Variances of the Mean Under Different Models
Population Variance
|
Number of Respondents
|
Variance of the Mean Estimator
|
|||||||
---|---|---|---|---|---|---|---|---|---|
4 | 3 | 10 | 40 | 0.16 | 18.50 | 18.56 | 0.15 | 14.82 | 14.87 |
20 | 30 | 0.16 | 18.50 | 18.56 | 0.14 | 11.14 | 11.18 | ||
30 | 20 | 0.16 | 18.50 | 18.56 | 0.12 | 7.46 | 7.48 | ||
40 | 10 | 0.16 | 18.50 | 18.56 | 0.11 | 3.78 | 3.79 | ||
6 | 10 | 40 | 0.22 | 18.50 | 18.62 | 0.20 | 14.82 | 14.92 | |
20 | 30 | 0.22 | 18.50 | 18.62 | 0.17 | 11.14 | 11.21 | ||
30 | 20 | 0.22 | 18.50 | 18.62 | 0.15 | 7.46 | 7.51 | ||
40 | 10 | 0.22 | 18.50 | 18.62 | 0.12 | 3.78 | 3.80 | ||
8 | 5 | 10 | 40 | 0.20 | 36.90 | 37.00 | 0.18 | 29.54 | 29.62 |
20 | 30 | 0.20 | 36.90 | 37.00 | 0.16 | 22.18 | 22.24 | ||
30 | 20 | 0.20 | 36.90 | 37.00 | 0.14 | 14.82 | 14.86 | ||
40 | 10 | 0.20 | 36.90 | 37.00 | 0.12 | 7.46 | 7.48 | ||
10 | 10 | 40 | 0.30 | 36.90 | 37.10 | 0.26 | 29.54 | 29.70 | |
20 | 30 | 0.30 | 36.90 | 37.10 | 0.22 | 22.18 | 22.30 | ||
30 | 20 | 0.30 | 36.90 | 37.10 | 0.18 | 14.82 | 14.90 | ||
40 | 10 | 0.30 | 36.90 | 37.10 | 0.14 | 7.46 | 7.50 | ||
12 | 8 | 10 | 40 | 0.26 | 55.30 | 55.46 | 0.23 | 44.26 | 44.39 |
20 | 30 | 0.26 | 55.30 | 55.46 | 0.20 | 33.22 | 33.32 | ||
30 | 20 | 0.26 | 55.30 | 55.46 | 0.16 | 22.18 | 22.24 | ||
40 | 10 | 0.26 | 55.30 | 55.46 | 0.13 | 11.14 | 11.17 | ||
15 | 10 | 40 | 0.40 | 55.30 | 55.60 | 0.34 | 44.26 | 44.50 | |
20 | 30 | 0.40 | 55.30 | 55.60 | 0.28 | 33.22 | 33.40 | ||
30 | 20 | 0.40 | 55.30 | 55.60 | 0.22 | 22.18 | 22.30 | ||
40 | 10 | 0.4 | 55.3 | 55.6 | 0.16 | 11.14 | 11.2 |
Note. , , . W, EH, DP, p1, p2, p3 = the Warner (1971), the Eichhorn and Hayre (1983), the Diana and Perri (2011), and the three proposed models, respectively.
Table 3 displays the improvement in terms of values over the existing models.
Table 3
Values for Different Models
4 | 3 | 10 | 40 | 0.053333 | 0.020109 | 0.020108 | 0.049333 | 0.016109 | 0.016108 |
20 | 30 | 0.053333 | 0.020109 | 0.020108 | 0.045333 | 0.012109 | 0.012108 | ||
30 | 20 | 0.053333 | 0.020109 | 0.020108 | 0.041333 | 0.008109 | 0.008108 | ||
40 | 10 | 0.053333 | 0.020109 | 0.020108 | 0.037333 | 0.004109 | 0.004108 | ||
6 | 10 | 40 | 0.036667 | 0.020109 | 0.020108 | 0.032667 | 0.016109 | 0.016108 | |
20 | 30 | 0.036667 | 0.020109 | 0.020108 | 0.028667 | 0.012109 | 0.012108 | ||
30 | 20 | 0.036667 | 0.020109 | 0.020108 | 0.024667 | 0.008109 | 0.008108 | ||
40 | 10 | 0.036667 | 0.020109 | 0.020108 | 0.020667 | 0.004109 | 0.004108 | ||
8 | 5 | 10 | 40 | 0.04 | 0.020054 | 0.020054 | 0.036 | 0.016054 | 0.016054 |
20 | 30 | 0.04 | 0.020054 | 0.020054 | 0.032 | 0.012054 | 0.012054 | ||
30 | 20 | 0.04 | 0.020054 | 0.020054 | 0.028 | 0.008054 | 0.008054 | ||
40 | 10 | 0.04 | 0.020054 | 0.020054 | 0.024 | 0.004054 | 0.004054 | ||
10 | 10 | 40 | 0.03 | 0.020054 | 0.020054 | 0.026 | 0.016054 | 0.016054 | |
20 | 30 | 0.03 | 0.020054 | 0.020054 | 0.022 | 0.012054 | 0.012054 | ||
30 | 20 | 0.03 | 0.020054 | 0.020054 | 0.018 | 0.008054 | 0.008054 | ||
40 | 10 | 0.03 | 0.020054 | 0.020054 | 0.014 | 0.004054 | 0.004054 | ||
12 | 8 | 10 | 40 | 0.0325 | 0.020036 | 0.020036 | 0.0285 | 0.016036 | 0.016036 |
20 | 30 | 0.0325 | 0.020036 | 0.020036 | 0.0245 | 0.012036 | 0.012036 | ||
30 | 20 | 0.0325 | 0.020036 | 0.020036 | 0.0205 | 0.008036 | 0.008036 | ||
40 | 10 | 0.0325 | 0.020036 | 0.020036 | 0.0165 | 0.004036 | 0.004036 | ||
15 | 10 | 40 | 0.026667 | 0.020036 | 0.020036 | 0.022667 | 0.016036 | 0.016036 | |
20 | 30 | 0.026667 | 0.020036 | 0.020036 | 0.018667 | 0.012036 | 0.012036 | ||
30 | 20 | 0.026667 | 0.020036 | 0.020036 | 0.014667 | 0.008036 | 0.008036 | ||
40 | 10 | 0.026667 | 0.020036 | 0.020036 | 0.010667 | 0.004036 | 0.004036 |
Note. , , .
Simulation Study
In order to show improvement in efficiency and privacy protection, a simulation study was carried out by generating an artificial population of N = 5000 units from a normal distribution having mean 200 and variance 25. For the additive scrambling variable S, the random numbers were generated using a normal distribution with mean 0 and variance 1.5625. For the multiplicative scrambling variable T, the random numbers were generated using a normal distribution with mean 1 and variance 1.5625. A total of 1000 iterations of sample selection were run, using the sample size n = 1000 at each iteration. The results of the amount of bias in the mean estimator under each of the three proposed models are presented in Table 4. Likewise, the results of the simulated variances can be observed in Table 5 with values in Table 6. Observing Tables 4, 5, and 6Table 5Table 6, one may clearly see the improvement over the existing models. In Table 4, most of the simulated values of bias are close to zero for all of three proposed models, which is consistent with the unbiasedness proved in Equation 36.
Table 4
Simulated Bias in the Mean Estimator Under the Proposed Models
Variance
|
Population Value
|
Simulated Bias
|
|||
---|---|---|---|---|---|
/ | |||||
1.25 | 200 | 800 | -0.04231722 | -0.3359392 | -0.3343549 |
400 | 600 | -0.05178339 | 0.06455729 | 0.06581804 | |
500 | 500 | -0.04452886 | 0.04366002 | 0.04456227 | |
600 | 400 | -0.04710829 | -0.03758789 | -0.03710422 | |
800 | 200 | -0.03654629 | -0.2208454 | -0.2205013 | |
1.5 | 200 | 800 | -0.04200036 | -0.3943467 | -0.3924455 |
400 | 600 | -0.05153124 | 0.08807758 | 0.08959048 | |
500 | 500 | -0.04434841 | 0.06147825 | 0.06256095 | |
600 | 400 | -0.04701155 | -0.03558708 | -0.03500667 | |
800 | 200 | -0.03647746 | -0.2576364 | -0.2572234 | |
1.75 | 200 | 800 | -0.04168349 | -0.4527542 | -0.4505362 |
400 | 600 | -0.05127909 | 0.1115979 | 0.1133629 | |
500 | 500 | -0.04416796 | 0.07929647 | 0.08055963 | |
600 | 400 | -0.04691482 | -0.03358627 | -0.03290912 | |
800 | 200 | -0.03640864 | -0.2944274 | -0.2939456 | |
2 | 200 | 800 | -0.04136663 | -0.5111618 | -0.5086269 |
400 | 600 | -0.05102694 | 0.1351182 | 0.1371354 | |
500 | 500 | -0.04398751 | 0.0971147 | 0.09855831 | |
600 | 400 | -0.04681808 | -0.03158545 | -0.03081157 | |
800 | 200 | -0.03633981 | -0.3312184 | -0.3306678 |
Table 5
Simulated Variances of the Mean Under the Proposed and Existing Models
Variance
|
Population Value
|
Population Mean
|
||||||
---|---|---|---|---|---|---|---|---|
/ | ||||||||
1.25 | 200 | 800 | 0.02383096 | 0.02360185 | 61.79515 | 48.81937 | 61.81969 | 48.83813 |
400 | 600 | 0.02319297 | 0.02303067 | 58.02994 | 35.89469 | 58.05022 | 35.90795 | |
500 | 500 | 0.02290962 | 0.02259029 | 64.59142 | 30.46517 | 64.57387 | 30.45292 | |
600 | 400 | 0.02425798 | 0.02365736 | 61.87235 | 24.96826 | 61.91008 | 24.97025 | |
800 | 200 | 0.02252473 | 0.02109507 | 63.99624 | 11.9735 | 64.02271 | 11.97824 | |
1.5 | 200 | 800 | 0.02453836 | 0.02419761 | 88.99663 | 70.30539 | 89.03195 | 70.33238 |
400 | 600 | 0.0238225 | 0.02350492 | 83.54286 | 51.69235 | 83.57212 | 51.71143 | |
500 | 500 | 0.02352331 | 0.02296447 | 93.01462 | 43.88139 | 92.98943 | 43.86372 | |
600 | 400 | 0.02486492 | 0.0239517 | 89.089 | 35.94049 | 89.14334 | 35.94329 | |
800 | 200 | 0.02324659 | 0.02123525 | 92.14935 | 17.22565 | 92.18741 | 17.23246 | |
1.75 | 200 | 800 | 0.02537135 | 0.02489702 | 121.1471 | 95.70026 | 121.1951 | 95.73697 |
400 | 600 | 0.02457703 | 0.02406322 | 113.6929 | 70.36438 | 113.7328 | 70.39033 | |
500 | 500 | 0.02426035 | 0.02340347 | 126.6076 | 59.74012 | 126.5734 | 59.71605 | |
600 | 400 | 0.02558184 | 0.02429185 | 121.2546 | 48.90709 | 121.3285 | 48.91086 | |
800 | 200 | 0.02409082 | 0.02139922 | 125.4218 | 23.43166 | 125.4736 | 23.44092 | |
2 | 200 | 800 | 0.02632994 | 0.02570009 | 158.2465 | 125.004 | 158.3092 | 125.0519 |
400 | 600 | 0.02545657 | 0.02470559 | 148.48 | 91.91078 | 148.5322 | 91.94466 | |
500 | 500 | 0.02512075 | 0.02390727 | 165.3703 | 78.04135 | 165.3257 | 78.0099 | |
600 | 400 | 0.02640872 | 0.02467781 | 158.3691 | 63.86808 | 158.4657 | 63.87295 | |
800 | 200 | 0.02505742 | 0.02158696 | 163.8136 | 30.59154 | 163.8811 | 30.60362 |
Table 6
Simulated Values of the Proposed and Existing Models
Variance
|
Population Value
|
Change Value
|
|||||
---|---|---|---|---|---|---|---|
/ | |||||||
1.25 | 200 | 0.0151905 | 0.0150673 | 0.0009907802 | 0.0007811312 | 0.0009911831 | 0.0007814225 |
400 | 0.01490502 | 0.0148246 | 0.0009260774 | 0.0005744241 | 0.0009263716 | 0.0005746333 | |
500 | 0.01472327 | 0.0145062 | 0.001029526 | 0.0004839192 | 0.001029244 | 0.0004837062 | |
600 | 0.01557841 | 0.0152525 | 0.0009908548 | 0.0004004777 | 0.0009914397 | 0.0004004859 | |
800 | 0.01445102 | 0.01365198 | 0.001022456 | 0.0001915471 | 0.001022894 | 0.0001916116 | |
1.5 | 200 | 0.01086234 | 0.01072791 | 0.0009909185 | 0.0007811924 | 0.000991321 | 0.0007814833 |
400 | 0.01062884 | 0.01050509 | 0.0009258519 | 0.0005744713 | 0.0009261468 | 0.0005746804 | |
500 | 0.01049757 | 0.01024015 | 0.001029553 | 0.0004840504 | 0.001029271 | 0.000483837 | |
600 | 0.01108908 | 0.01072322 | 0.000990769 | 0.00040031 | 0.0009913538 | 0.0004003176 | |
800 | 0.01035557 | 0.009544028 | 0.001022394 | 0.0001913655 | 0.001022831 | 0.0001914299 | |
1.75 | 200 | 0.008251483 | 0.008109662 | 0.0009910293 | 0.0007812483 | 0.0009914316 | 0.0007815389 |
400 | 0.008054057 | 0.007899927 | 0.0009257027 | 0.0005745169 | 0.0009259983 | 0.0005747259 | |
500 | 0.007953403 | 0.007666759 | 0.001029583 | 0.000484156 | 0.001029302 | 0.0004839422 | |
600 | 0.008381798 | 0.007989669 | 0.0009907202 | 0.0004002028 | 0.0009913048 | 0.0004002098 | |
800 | 0.007883378 | 0.007066392 | 0.001022361 | 0.0001912472 | 0.001022797 | 0.0001913115 | |
2 | 200 | 0.006556218 | 0.006409192 | 0.000991119 | 0.0007812969 | 0.0009915211 | 0.0007815872 |
400 | 0.006385273 | 0.006208672 | 0.0009255975 | 0.0005745577 | 0.0009258934 | 0.0005747667 | |
500 | 0.006304594 | 0.005995812 | 0.001029612 | 0.0004842417 | 0.001029332 | 0.0004840276 | |
600 | 0.006624443 | 0.006213837 | 0.0009906904 | 0.0004001292 | 0.000991275 | 0.0004001358 | |
800 | 0.006277043 | 0.005457874 | 0.001022342 | 0.0001911647 | 0.001022778 | 0.0001912289 |
Discussion and Conclusion
This paper presents an alternative procedure to the so-called optional quantitative randomized response models. Modified versions of the Warner (1971), the Eichhorn and Hayre (1983), and the Diana and Perri (2011) models were analyzed in previous sections. The efficiency conditions are strong and always hold, which shows that suggested modified variants are superior to the existing versions.
Observing Table 2 and Table 3, the improvement over the existing methods may be seen for various choices of and . Table 3 shows the improvement in terms of values over the existing models. It is observed that the suggested Model I is superior to the Warner (1971) model, Model II is better than the Eichhorn and Hayre (1983) quantitative model, and the proposed Model III is better than the Diana and Perri (2011) model. Moreover, one may observe that among the proposed models, Model I is the best model in terms of efficiency. However, the proposed Model III is the best model if values are taken into account. It is also observed that as increases, the variance of the mean for each of the proposed models decreases. This means that as the number of respondents opting for true response increases, the efficiency of the models increases. Therefore, it is advised to the researchers to motivate the respondents to opt for true response as far as possible. This will minimize the number of those opting for scrambled responses, thus resulting in efficient estimates of the mean.
Table 4 shows that among the three proposed models, the proposed Model I produces less amount of simulated bias compared to the proposed Model II and Model III, which makes Model I the best of the three models, in situations where unbiasedness is the priority for model selection. Moreover, the proposed Model I utilizes only additive scrambling, which makes it simpler than the proposed Model III where the respondents have to scramble their response using both additive and multiplicative scrambling. Moreover, the proposed Model I is also much more efficient than the proposed Model II and Model III. However, Table 6 shows that the simulated values of the joint measure of privacy and efficiency under Model I are the worst among the three proposed models. Further, one may also observe from Table 5 that the proposed Model II and Model III are nearly equally efficient but Model II is better in terms of simplicity as it only uses multiplicative scrambling. The proposed Model III, on the other hand, provides a higher level of privacy protection since the respondents use both additive and multiplicative scrambling to report their responses.
The current study analyzed the efficiency of the mean estimator under the suggested alternative to the optional randomized response models. It may be interesting if researchers study estimation of other parameters like population median, variance, population proportion etc. under the suggested randomized response models.