Saturday, November 2, 2013

Trust? ==> 0 or 100 ... or perhaps 42 ?

Trust? ==> 0 or 100 ... or perhaps 42 ?

When I ask the Google voice search:

"What time is it in Mountain View?"

Why do I trust the answer returned?

And when I ask:

"Who invented the world?"

the answer brings tears to my eyes from laughing?

You know the answer should have been:


If you didn't knew that I recommend this book.

The first level of trust in these two examples is in speech recognition. The app provides feedback (the recognized text spoken) so I can check whether the transcription has been done correctly. Elementary but undeniably necessary to establish trust in the answer.

Next comes the answer. In the first example, over a period of short time I came to trust the kind of serves that provide the current time all around the world. Based on my knowledge that these service are not difficult to realize and the only thing that matters is to correctly identify the place asked for.

And the second case? Does it degrades my build up trust in the quality of Google's search results?
Nope! But why?
Save for the 42, I knew that there was no perfect answer, but also I can explain why +Tim Berners-Lee  popped up, and I also know how this service can be improved, in order not to make mistakes like that.
If Google had used it's knowledge graph, which obviously they didn't, it would have known that "World" and "World Wide Web" are different things.

When exchanging information with the help of computer systems, three kind of "things" need to build up their own trustworthiness:
- services (that they do what they are supposed to do)
- persons
- information

The rest of this post will essentially handle the last element of this list, but let's have a quick look at the first two.

Trust in services
How do we evaluate the trustworthiness of services? I imagine that my first trip in a self driving car will be extremely stressful for me (not for the car of course). During the fifth I'll read and answer my mails and at the tenth trip I'll take a nap.
Btw, I tend to include organizations here to, but that's of course subject to discussions.
Positive feedback from other persons using the same service accelerates the trust building. Negative feedback on the other hand ...
This naturally leads to 

Trust in persons
Complex matter. Trusting that actions will be executed timely and correctly? That the information provided is true? 
For me key elements in here are: evolution over time, at any moment doubt can pop up, other people's opinion does matter (it certainly influences the initial trust state you have in someone) but less then your own perceptions and finally trust in persons is very fine-grained (on some aspects I can trust someone almost completely, on other aspects less). If physical contact exists (directly or with some communication system in between - video, phone) body language (including voice) is an important factor.
Referring to the negative feedback above: If that feedback was given by a person I trust (on that specific topic, e.g. an authority in the field) it might outreach by far thousands of positive feedbacks of people I do not know.
The feedback provided, well ... it's information

Trust in information
The level of trust we have in information depends on the trust we have in the original provider, the trust we have in the whole chain of intermediate transmissions and/or transformations up to the trust we have in our own sensory input.

I like to make a distinction between carbon-based and silicon-based producers (although silicon is perhaps a bit to imprecise).

Starting with the latter, we can separate them in two categories: the ones providing raw data (a surveillance video camera e.g.) and the ones providing interpreted data (temperature sensor, smoke detector or a smart collision detector camera in a tunnel). These last ones can be seen as a combination of a raw sensor and one or more chained interpretation services (software in most cases).
As sensors a relatively cheap, often multiplying the number of sensors can validate the produced information. In case of interpreted data, we should of course prefer different interpretation services to avoid or detect errors in a specific one.
Taking a helicopter view, what is a search engine? Input: the Internet (text, images, video), services : indexing, filtering, output: references related to a query. Or like in this example: input: YouTube still frames, output: cats and faces.
Thus the frontier between traditional sensors of our physical world and sensors of our digital worlds (a crawl bot or a collision detector in a virtual world) is perhaps not so clean as it might appear initially.

Remains the last element, the human producer, before handling the essence of this post (sorry for the long introduction). Yes, I know that dogs are also producers (of ...).

In some sense we are a bunch of sensors operating since our birth, a bunch of input interpretation services called brain, a bunch of restitution services called memory (recent neurological research shows that different kind of input is stored in different parts of our memory) and finally a bunch of production services (voice, body language and the fingers used to type this post - dictation is not good enough yet).
All of them are subject to errors.


I forgot one. The only one which doesn't make any errors (by definition): creation services.

Trustworthy? Make up your own mind.

Apparently there is a strong need for some kind of validation (a term silently introduced above) especially for human produced information. At least if we feel the need to think about trusting it or not.

My ideal information world?
OK +Google Glass  (or some other device or app), show me how reliable this information is and why.

- independency of original sources
- trustworthiness of the independent sources
- chain of dependent sources

The toolbox:
- annotate
- annotate
- annotate

Big data? Sure!
Feasible? Sure!
Done? Nope! (only partially for recent productions: retweets and re-shares e.g.).

Now it's time to use the magic word "Semantics".
One thing that is necessary is establishing identity of the producer. Mind that for humans this is not necessarily an identification of the physical person, an avatar is fine. Nothing wrong with having multiple identities. There is only a need for enough data to establish some meaningful (initial) state of trustworthiness, authority or whatever term you would like to use.
The meaning (read semantics) of the information produced. This becomes better and better but there is still much room left for improvements (or perhaps we should not go for the 10% improvement but for the 10 x solution - i.e. come up with an entirely new way). This is needed to trace back when and by whom the same information was given. The copy - paste of text is easy to detect, but we are still in the childhood of detecting copy - paste of meaning.

And a facilitating role we all have: cite your sources! This is common is scientific sources. But elsewhere? We will have another facilitation role, but that will be the subject of another post.

Why this post now and here?
How many books have been scanned and/or are available online? How many articles are online? And blog posts? Not to mention all the rest. The source data is there. And so are the (not yet perfect) techniques.
Computer power is also available.

What are we waiting for?


No comments:

Post a Comment