| | - Question: author anonymization
In our application field, we don't need to hide authors' identities. Are there any possibility to improve efficiency or accuracy by not anonymizing author? If so, I want to know the difference.
Answer: In general it simplifies a bit the situation if the anonymization layer is dropped. But if we want to be able to give estimation for a new (not yet rated) document we have to maintain the mapping between an author and its documents or objects. The reputation of a new document is then derived from the reputation of the previous contribution of the author. If the anonymization is dropped this mapping becomes public but is still needed. The user's identity or pseudonyms are represented as indices (Integers) with respect to the reputation system. The Anonymous Reputation System (in our case Anonymization Layer + EigenTrust) has still the complete information of all given ratings. But that detailed information is private.
These ratings enter the Reputation System EigenTrust. The EigenTrust algorithm then computes the effective reputation (Trust Vector). The output is that the reputation of peer i is a real number between 0 (not trusted, no reputation at all) and 1 (trusted, high reputation). If that number is e.g. r=0.3159012398, our Anonymization Layer reduces artificially the accuracy of the published reputation information by rounding to a set of discrete reputation levels e.g. r=0.3 {0, 0.1, 0.2, ..., 0.9, 1}. In so far you are right the accuracy can be improved by skipping that. We round the results in order to avoid some side channel attacks. Our system makes it impossible for every two documents to say if they originate from the same author or not.
If the users accept to give their identities as author and as rating peer that in not further needed.
The reason why anonymity and privacy may be wanted becomes clearer in the answer to question 2. - Question: Robustness to cheating.
What does cheating mean? There are different scenarios thinkable?
Answer: A group of (real existing) people works together in order to increase its own reputation or to decrease the reputation of another peer. That is difficult to prevent. From on point of view it is not cheating because that is the real opinion of the members of that group. If you find enough people who find the contents of another peer good that peer has a high reputation in that group. That is ok. Luckily the EigenTrust limits the influence of those ratings to other groups which are thinking different. It is done by selecting some pretrusted peers.
The user might to try to increase its reputation by self rating (rating its own contributions). If we check the identity of the user we can prevent that.
Then the user can create one ore multiple additional accounts to rate its own contents as good. A more serious attack is to artificially create such groups or clusters. (That is the same of cheating the Google PageRank by generating interlinking web sides). The cheating user creates 10,000 accounts and can in that way multiply its impact to the reputation system. It is known as Sibyl attack in literature. Here the system has to verify the user's identity. That is often implemented by checking e-mail address. But a user can have multiple e-mail addresses. If one has an own domain you can easily construct a million valid addresses. Here we need a harder proof of identity. The system has to enforce the policy that every person has only one account in the reputation system. That is the same as democratic elections. Every citizen has only one vote. We have seen that simple e-mail subscription is not sufficient to enforce that.
A real Identity Provider would help in that situation. That could be the e.g. mobile provider. It can decide if the person gives its real and only identity Name, First Name, Date of Birth for the account creation. Consequently one user can only have one account, that’s fine. But the question is if the users will accept such hard identity subscription. If all private data become public that would repel the users. I wouldn't subscribe that service if everyone can know my clear name and date of birth by only looking on a file what I have provided. And it is not necessary to make that data public in our proposed architecture. In so far the question 2 shows that the Anonymization Layer helps to create acceptance for measures against cheating. The proposed authentication mechanisms (user to Reputation Provider) are necessary in each case even without anonymization. The authentication (Reputation Provider -> User) can be skipped in the non-anonymous case. But the resulting saving is little.
|