<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.encyclosphere.org/index.php?action=history&amp;feed=atom&amp;title=Standardized_reader_stat_features</id>
	<title>Standardized reader stat features - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.encyclosphere.org/index.php?action=history&amp;feed=atom&amp;title=Standardized_reader_stat_features"/>
	<link rel="alternate" type="text/html" href="https://wiki.encyclosphere.org/index.php?title=Standardized_reader_stat_features&amp;action=history"/>
	<updated>2026-05-14T09:52:42Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.41.0</generator>
	<entry>
		<id>https://wiki.encyclosphere.org/index.php?title=Standardized_reader_stat_features&amp;diff=447&amp;oldid=prev</id>
		<title>Hampson at 17:09, 16 February 2024</title>
		<link rel="alternate" type="text/html" href="https://wiki.encyclosphere.org/index.php?title=Standardized_reader_stat_features&amp;diff=447&amp;oldid=prev"/>
		<updated>2024-02-16T17:09:03Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 13:09, 16 February 2024&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l1&quot;&gt;Line 1:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 1:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== I. Purpose ==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== I. Purpose ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Publishers greatly desire access to statistical data collected by Encyclosphere readers, and there is no reason not to make such data available to them, as long as it can be collected and shared without exposing any data that should not be exposed on privacy grounds. By giving publishers access to this data, we incentivize them to participate more actively; if we do not give them this data, our aggregators appear to be competing with them, which is an awful impression to leave, because it is simply not true.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Publishers greatly desire access to statistical data collected by &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;[[&lt;/ins&gt;Encyclosphere&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;]] &lt;/ins&gt;readers, and there is no reason not to make such data available to them, as long as it can be collected and shared without exposing any data that should not be exposed on privacy grounds. By giving publishers access to this data, we incentivize them to participate more actively; if we do not give them this data, our aggregators appear to be competing with them, which is an awful impression to leave, because it is simply not true.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Such data really needs to be standardized for ease of access and aggregation. Thus, preliminarily, we need to answer three questions:&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Such data really needs to be standardized for ease of access and aggregation. Thus, preliminarily, we need to answer three questions:&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l27&quot;&gt;Line 27:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 27:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== V. Standardizing analytics data storage and API ==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== V. Standardizing analytics data storage and API ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;We should also stay in communication whenever anyone starts working on an analytics system, so that we adopt (or create) the same standard for storing analytics data and making it available via an API. Basically, someone should be able to download the analytics data from 2024 for EncycloReader and EncycloSearch using the same code, and use the same software to view the outputted files.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;We should also stay in communication whenever anyone starts working on an analytics system, so that we adopt (or create) the same standard for storing analytics data and making it available via an API. Basically, someone should be able to download the analytics data from 2024 for &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;[[&lt;/ins&gt;EncycloReader&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;.org|EncycloReader]] &lt;/ins&gt;and &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;[[EncycloSearch.org|&lt;/ins&gt;EncycloSearch&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;]] &lt;/ins&gt;using the same code, and use the same software to view the outputted files.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Hampson</name></author>
	</entry>
	<entry>
		<id>https://wiki.encyclosphere.org/index.php?title=Standardized_reader_stat_features&amp;diff=10&amp;oldid=prev</id>
		<title>Lsanger: Created page with &quot; == I. Purpose == Publishers greatly desire access to statistical data collected by Encyclosphere readers, and there is no reason not to make such data available to them, as long as it can be collected and shared without exposing any data that should not be exposed on privacy grounds. By giving publishers access to this data, we incentivize them to participate more actively; if we do not give them this data, our aggregators appear to be competing with them, which is an a...&quot;</title>
		<link rel="alternate" type="text/html" href="https://wiki.encyclosphere.org/index.php?title=Standardized_reader_stat_features&amp;diff=10&amp;oldid=prev"/>
		<updated>2024-01-19T19:27:56Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot; == I. Purpose == Publishers greatly desire access to statistical data collected by Encyclosphere readers, and there is no reason not to make such data available to them, as long as it can be collected and shared without exposing any data that should not be exposed on privacy grounds. By giving publishers access to this data, we incentivize them to participate more actively; if we do not give them this data, our aggregators appear to be competing with them, which is an a...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;&lt;br /&gt;
== I. Purpose ==&lt;br /&gt;
Publishers greatly desire access to statistical data collected by Encyclosphere readers, and there is no reason not to make such data available to them, as long as it can be collected and shared without exposing any data that should not be exposed on privacy grounds. By giving publishers access to this data, we incentivize them to participate more actively; if we do not give them this data, our aggregators appear to be competing with them, which is an awful impression to leave, because it is simply not true.&lt;br /&gt;
&lt;br /&gt;
Such data really needs to be standardized for ease of access and aggregation. Thus, preliminarily, we need to answer three questions:&lt;br /&gt;
&lt;br /&gt;
* What data ''should not'' be collected, because it represents a privacy violation of end users?&lt;br /&gt;
* Of remaining data, which ''should'' be collected, because it would be useful?&lt;br /&gt;
* How should the data be collected, so as to avoid the enumerated privacy violations and so as to ensure the useful data is collected?&lt;br /&gt;
&lt;br /&gt;
== II. Data ''not'' to collect ==&lt;br /&gt;
Reflecting on the sort of data that is typically logged by web analytics software, there there are two types of data that represents a privacy incursion. First, IP addresses: they expose a great deal of information, because they are sometimes (indeed often) unique. Second, personally identifiable data from which an individual's identity can be inferred, especially behavioral data, must be avoided. In the latter category, debatably, would be referrers. The problem with referrers is that, due to the fine-grained nature of them (even when generalized), they can frequently&lt;br /&gt;
&lt;br /&gt;
Generalized or anonymized data is acceptable, but one must not publish anonymized data that can be tied to particular individuals due to small sample size. For example, suppose a spy wants to find out whether an individual from Wyoming has been looking at a certain biography of a government-identified terrorist. Suppose only one person viewed that article, and the (anonymized) data showed that that person was from Wyoming. That would be a failure to respect privacy.&lt;br /&gt;
&lt;br /&gt;
== III. What data ''should'' be collected ==&lt;br /&gt;
The most important numbers publishers want are, of course, page views and unique views for each article. Page views for article components, such as images, are less important, but might be &amp;quot;nice to have.&amp;quot; It is also extremely interesting to publishers to know where their traffic is coming from, down at least to the state and national level. Search engine and social media referrals make good sense to include, if generalized. Data should be recorded and made available at a variety of levels of time granularity, down to daily, ''except'' when the numbers are so small as to permit identifiability. Beyond this, there really is nothing terribly important.&lt;br /&gt;
&lt;br /&gt;
== IV. How to collect the data ==&lt;br /&gt;
IP addresses should never be made available. It should also be impossible to supply an IP address, apply a common hashing algorithm, and determine a match with some data that is supplied.&lt;br /&gt;
&lt;br /&gt;
Generally, unless the unique number for any metric (whether based on geography, time, or whatever) is at least 10, it should not be recorded. For example, if only 9 visitors from Ohio visited the &amp;quot;George Washington&amp;quot; article in December 2023, then no page should reveal that fact; but if 10 visitors from New York visit the article in December 2023, then it is acceptable to reveal that fact. Of course, the 9 December visitors should be counted in the summary 2023 data, assuming more than 9 visitors from Ohio visited the article in 2023. Etc.&lt;br /&gt;
&lt;br /&gt;
Data should be stored in a hashed/encrypted form until a generation routine determines what is to be published (each day, I suppose). Such granular source data must ''never'' be made available via API. IP addresses should be not just immediately hashed, but hashed in such a way as to be undiscoverable by the developer.&lt;br /&gt;
&lt;br /&gt;
Obviously, we will want to make sure we follow relevant laws and regulations, but the above plans should satisfy even the most stringent regulations.&lt;br /&gt;
&lt;br /&gt;
== V. Standardizing analytics data storage and API ==&lt;br /&gt;
We should also stay in communication whenever anyone starts working on an analytics system, so that we adopt (or create) the same standard for storing analytics data and making it available via an API. Basically, someone should be able to download the analytics data from 2024 for EncycloReader and EncycloSearch using the same code, and use the same software to view the outputted files.&lt;/div&gt;</summary>
		<author><name>Lsanger</name></author>
	</entry>
</feed>