[05:10:21] 10Analytics, 10Analytics-EventLogging, 10DBA, 10Operations, 10ops-eqiad: db1107 (eventlogging db master) possibly memory issues - https://phabricator.wikimedia.org/T222050 (10Marostegui) [05:10:44] 10Analytics, 10Analytics-EventLogging, 10DBA, 10Operations, 10ops-eqiad: db1107 (eventlogging db master) possibly memory issues - https://phabricator.wikimedia.org/T222050 (10Marostegui) p:05Triage→03Normal [05:11:14] 10Analytics, 10Analytics-EventLogging, 10DBA, 10Operations, 10ops-eqiad: db1107 (eventlogging db master) possibly memory issues - https://phabricator.wikimedia.org/T222050 (10Marostegui) I've acked the alert for now. [05:51:21] 10Analytics, 10Analytics-EventLogging, 10DBA, 10Operations, 10ops-eqiad: db1107 (eventlogging db master) possibly memory issues - https://phabricator.wikimedia.org/T222050 (10elukey) Thanks a lot @Marostegui! We can shutdown the host without any problem, it only needs a ~10m heads up to properly stop eve... [06:19:48] 10Analytics, 10Analytics-Data-Quality, 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Some registered users have null values for event_user_text and event_user_text_historical in mediawiki_history - https://phabricator.wikimedia.org/T218463 (10Neil_P._Quinn_WMF) >>! In T218463#5124954, @JAl... [07:11:35] Morning :) [07:14:14] bonjour :) [07:34:20] joal: wondering if you want to test https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/506609/ [07:34:51] elukey: YES :) [07:34:55] Let's do it [07:36:36] all right! [07:36:47] disabled puppet on stat/notebook/etc.. hosts [07:36:59] elukey: on which host do we test that? [07:37:25] joal: analytics1030 is a good first candidate (test cluster), then I'd say stat1004 ? [07:37:32] ack! [07:40:57] joal: done on analytics1030, chown is needed [07:45:00] afaics sudo chown -R analytics-deploy:analytics-deploy * is enough [07:45:17] (I tried with find etc.. but all files are owned by 'analytics') [07:45:35] elukey: in /srv/deploy/analytics right? [07:45:46] 10Analytics, 10Analytics-EventLogging, 10DBA, 10Operations, 10ops-eqiad: db1107 (eventlogging db master) possibly memory issues - https://phabricator.wikimedia.org/T222050 (10Marostegui) a:03Cmjohnson Thanks - not sure how to proceed as the `dmesg` entries show that the issue is fixed but icinga is sti... [07:47:35] joal: yep [07:47:40] right [07:47:42] not from / :D [07:47:47] :) [07:50:25] proceeding with stat1004 [07:55:42] joal: looks good, proceeding with the rest [07:55:55] ack elukey! [07:56:14] elukey: I'll try a scap deploy once ready if ok for you :) [07:56:24] yep makes sense [08:10:34] joal: all done [08:13:15] we can try to scap deploy if you have time [08:13:41] elukey: I do have time [08:13:50] elukey: puppet renabled? [08:15:01] elukey: also, I'm guessing puppet has already run on deployment.eqiad [08:15:27] joal: it is not needed in there [08:15:36] Ah ? [08:15:40] Ok :) [08:15:47] files are owned by threbuchet [08:16:02] they are owned by analytics-deploy only on targets [08:16:04] so all good [08:16:06] oooooh [08:16:11] ok great [08:16:32] joal: ah one sec [08:16:45] to be completely accurate, lemme also remove the 'analytics' user [08:17:07] yessir [08:21:54] joal: good to go! [08:21:59] ok elukey :) [08:22:36] !log Deploying refinery using scap (analytics-deploy user test) [08:22:38] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:23:28] fail elukey :( [08:23:57] 08:23:14 ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'analytics/refinery', '-g', 'canary', 'fetch', '--refresh-config'] on stat1007.eqiad.wmnet returned [255]: Received disconnect from 2620:0:861:105:10:64:21:118 port 22:2: Too many authentication failures [08:24:01] Authentication failed. [08:24:04] :S [08:24:37] joal: there is surely something to restart, lemme check [08:25:34] ah probably the keyholder [08:36:37] ok restarted the keyholder on deployment server with no luck [08:38:44] :( [08:42:53] so I can access from deploy1001 to stat1004 with the analytics-deploy user [08:43:06] using the keyholder [08:43:14] so the transport/auth works [08:43:25] it is probably a matter of old config [08:44:20] » [08:50:00] 08:49:41 Running remote deploy cmd ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'analytics/refinery', '-g', 'default', 'fetch', '--refresh-config'] [08:50:03] 08:49:41 Unable to find keyholder key for analytics [08:50:03] joal: --^ [08:50:13] this is the issue [08:50:44] ahhhh the scap configgggggg [08:52:00] (03PS1) 10Elukey: Replace the 'analytics' user with 'analytics-deploy' [analytics/refinery/scap] - 10https://gerrit.wikimedia.org/r/506951 (https://phabricator.wikimedia.org/T220971) [08:52:05] joal: --^ [08:55:33] (03CR) 10Elukey: [V: 03+2 C: 03+2] Replace the 'analytics' user with 'analytics-deploy' [analytics/refinery/scap] - 10https://gerrit.wikimedia.org/r/506951 (https://phabricator.wikimedia.org/T220971) (owner: 10Elukey) [08:56:34] ah elukey - Thanks for that [08:56:41] I am testing scap deploy -v --no-log-message -l stat1004.eqiad.wmnet "Test deployment for new user" [08:56:44] seems working [08:56:55] \o/ [08:57:13] still waiting for its completion though [08:58:43] done! [08:58:54] joal: if you want to test as well it would be great [08:59:00] doing so elukey [08:59:43] elukey: proceeding - seems good [09:00:12] elukey: we also probably want to deleted some scap deployed versions (size on disk) [09:01:29] joal: scap should only keep the last two by config [09:01:47] Ahhhh - My bad sorry - Forgot that [09:19:53] joal: all right then, next step is to create the 'analytics' user :) [09:27:20] elukey: looks like me deploying is causing space issue on notebook1004 (or so I assume) [09:28:06] already cleaned up [09:28:15] Many thanks :) [09:28:29] I think that the revs kept were 3 waiting for the clean up [09:28:58] in theory we could even use 1 IIUC from what tyler wrote in the code review [09:29:49] https://gerrit.wikimedia.org/r/#/c/analytics/refinery/scap/+/494870/1/scap.cfg@11 [10:35:00] a-team: forgot to mention on Friday, going off for today! Just sent an email explaining :) [10:35:10] I'll re-check later on in case I am needed [10:35:41] elukey: excuse me luca this is unacceptable [10:35:57] I knoowwww [11:11:46] joal: I seeeeeee something weirrrrdddddd in aqs [11:12:14] look at the dates in the results: [11:12:17] https://wikimedia.org/api/rest_v1/metrics/pageviews/aggregate/all-projects/all-access/user/monthly/2017040100/2019043000 [11:12:34] that's all good, it returns results from april 2017 [11:13:02] however, look at [11:13:03] https://wikimedia.org/api/rest_v1/metrics/pageviews/aggregate/all-projects/all-access/user/monthly/2017033100/2019043000 [11:13:39] the results come from May 2017 even though the start day is March 31, is this a known issue joal ? [11:15:21] fdans: The is the first time I encounter this error (or at least so does my memory tells me) [11:15:53] joal: yeah I had never noticed this, don't know if it's an old problem [11:16:17] but I was freaking out on wikistats because the timeselector was behaving weird [11:19:13] fdans: only related to pageviews, right? [11:19:24] fdans: works correctly for wikistats2 endpoints? [11:23:09] fdans: interesting finding in nodejs [11:24:13] fdans: https://gist.github.com/jobar/28c5a9e7e51a6b3144a1edf372699891 [11:26:06] fdans: April only has 30 days, the original date-day is 31, so bumping month by one makes the date be April 41st --> May 1st :) [11:26:19] Hurray for dates manipulation [11:26:45] fdans: Can you please create a ticket for this? [11:26:48] We should fix [11:27:24] yes definitely joal [11:28:38] (03CR) 10Lucas Werkmeister (WMDE): Apply all trivial auto-fixes to the PHP code style [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/506698 (owner: 10Lucas Werkmeister (WMDE)) [11:48:13] (03CR) 10Ladsgroup: [C: 03+2] Apply all trivial auto-fixes to the PHP code style [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/506698 (owner: 10Lucas Werkmeister (WMDE)) [11:48:19] (03CR) 10Ladsgroup: [C: 03+2] Add missing limits to explode() and fix PHPDoc tags [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/506699 (owner: 10Lucas Werkmeister (WMDE)) [11:48:24] (03CR) 10Ladsgroup: [C: 03+2] Update MediaWiki CodeSniffer to version 25.0.0 [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/506700 (owner: 10Lucas Werkmeister (WMDE)) [11:48:27] (03CR) 10Ladsgroup: [C: 03+2] Fix typo in author name "Adddshore" [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/506701 (owner: 10Lucas Werkmeister (WMDE)) [11:48:35] (03Merged) 10jenkins-bot: Apply all trivial auto-fixes to the PHP code style [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/506698 (owner: 10Lucas Werkmeister (WMDE)) [11:48:40] (03Merged) 10jenkins-bot: Add missing limits to explode() and fix PHPDoc tags [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/506699 (owner: 10Lucas Werkmeister (WMDE)) [11:48:49] (03Merged) 10jenkins-bot: Update MediaWiki CodeSniffer to version 25.0.0 [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/506700 (owner: 10Lucas Werkmeister (WMDE)) [11:48:52] (03Merged) 10jenkins-bot: Fix typo in author name "Adddshore" [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/506701 (owner: 10Lucas Werkmeister (WMDE)) [11:51:26] 10Analytics: Setting day 31 as start date on pageviews returns results from 1 month later than expected - https://phabricator.wikimedia.org/T222062 (10fdans) [13:44:42] ottomata: o/ [13:44:55] I added https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/504067/ to this week's SRE meeting etherpad [13:45:07] with you as point of contact, sorry I will not be able to attend :( [13:45:25] would you mind to answer any question that might arise? I added all the details to the etherpad [13:45:48] it is basically to allow all the team to restart the jupyter systemd units [13:45:52] (in case it is needed) [13:48:07] can do! [13:50:26] thankssss [13:50:46] also let me know if you agree with the change, I thought I had your +1 but I was mistaken :( [13:52:08] 10Analytics, 10User-Elukey: Check if HDFS offers a way to prevent/limit/throttle users to overwhelm the HDFS Namenode - https://phabricator.wikimedia.org/T220702 (10elukey) Found a good series of performance tuning for the Namenode's RPC handling: * https://community.hortonworks.com/articles/43838/scaling-the... [14:03:37] ottomata: last thing - I dont' have any update for the SRE meeting, not sure if you want to update them about eventgate [14:30:22] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 4 others: Modern Event Platform: Deploy instance of EventGate service that produces events to kafka main - https://phabricator.wikimedia.org/T218346 (10Ottomata) Hm, I think I mostly get what you are proposing, except I'm not sure h... [14:35:17] elukey: ya I can! [14:35:37] elukey: ya +1 for sure [14:39:04] ack thanks! [14:41:44] * elukey off! [15:00:51] ping ottomata , elukey , joal , milimetric , stadddupppp [15:09:59] (03CR) 10Ottomata: [C: 03+1] Make saltrotate store salts with timestamps as file name. (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/484250 (https://phabricator.wikimedia.org/T212014) (owner: 10Mforns) [15:11:27] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Mediawiki History Release - 2019-04 snapshot - https://phabricator.wikimedia.org/T221824 (10Nuria) [15:11:47] 10Analytics, 10Research: Check home leftovers of ISI researchers - https://phabricator.wikimedia.org/T215775 (10leila) @elukey I'm working with mtizzoni and panisson to figure out what to keep. This requires a deeper look at the data and Legal/Security input for a possible release. I expect this to be delayed... [15:22:47] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Discovery, and 5 others: Rewrite Avro schemas (ApiAction, CirrusSearchRequestSet) as JSONSchema and produce to EventGate - https://phabricator.wikimedia.org/T214080 (10Ottomata) > Update https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Tra... [15:23:44] 10Analytics: Setting day 31 as start date on pageviews returns results from 1 month later than expected - https://phabricator.wikimedia.org/T222062 (10fdans) p:05Triage→03Normal [15:25:43] 10Analytics, 10Analytics-Wikistats: Add "Top used photos" metric - https://phabricator.wikimedia.org/T220485 (10fdans) p:05Triage→03Low [15:28:01] 10Analytics, 10Product-Analytics, 10Epic, 10User-Elukey: Add wikidata ids to data lake tables - https://phabricator.wikimedia.org/T221890 (10fdans) @Groceryheist can you elaborate a bit on what exactly you're trying to accomplish? We're trying to understand exactly what we have to do here. [15:28:32] 10Analytics, 10Product-Analytics, 10Epic, 10User-Elukey: Add wikidata ids to data lake tables - https://phabricator.wikimedia.org/T221890 (10fdans) p:05Triage→03Normal [15:29:27] 10Analytics: Sqoop e-mail is emailing errors in try1 for actions that suceeed in try 3 - https://phabricator.wikimedia.org/T203811 (10fdans) 05Open→03Resolved a:03fdans This doesn't seem to be happening anymore. [15:36:57] (03CR) 10Fdans: [V: 03+2] Remove leftover files in oozie folders [analytics/refinery] - 10https://gerrit.wikimedia.org/r/504914 (https://phabricator.wikimedia.org/T221460) (owner: 10Joal) [15:38:35] 10Analytics, 10Research, 10Article-Recommendation, 10Patch-For-Review: Generate article recommendations in Hadoop for use in production - https://phabricator.wikimedia.org/T210844 (10fdans) [15:41:32] 10Analytics, 10Analytics-EventLogging, 10Better Use Of Data, 10EventBus, and 6 others: Modern Event Platform: Schema Guidelines and Conventions - https://phabricator.wikimedia.org/T214093 (10fdans) [15:46:06] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Event counts from Mysql and Hive don't match. Refine is persisting data from crawlers. - https://phabricator.wikimedia.org/T210006 (10fdans) @chelsyx since you can filter the data with the is_bot tag, we're going to close this ticket unless you have any o... [15:59:56] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Event counts from Mysql and Hive don't match. Refine is persisting data from crawlers. - https://phabricator.wikimedia.org/T210006 (10chelsyx) @fdans No objection from me. Thank you for looking into this issue! [16:08:23] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Event counts from Mysql and Hive don't match. Refine is persisting data from crawlers. - https://phabricator.wikimedia.org/T210006 (10Nuria) 05Open→03Resolved [17:07:02] 10Analytics, 10Product-Analytics, 10Epic, 10User-Elukey: Add wikidata ids to data lake tables - https://phabricator.wikimedia.org/T221890 (10Groceryheist) Hi @fdans My ultimate goal is to identify, from a random sample of ~500,000 to ~50,000,000 edits from different language Wikipedias. 1. Which edits... [17:46:53] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 4 others: Modern Event Platform: Deploy instance of EventGate service that produces events to kafka main - https://phabricator.wikimedia.org/T218346 (10Ottomata) Talked with Fabian in IRC. The idea of using wmf.releasename (which i... [19:40:11] 10Analytics, 10Growth-Team, 10Product-Analytics: Update ServerSideAccountCreation schema whitelist - https://phabricator.wikimedia.org/T222101 (10nettrom_WMF) [19:58:33] 10Analytics, 10Product-Analytics, 10Epic, 10User-Elukey: Add wikidata ids to data lake tables - https://phabricator.wikimedia.org/T221890 (10Nuria) >"map pages to wikidata ids" Is page_props teh table that holds this information? [20:00:04] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 3 others: Modern Event Platform: Stream Intake Service: Implementation - https://phabricator.wikimedia.org/T206785 (10Pchelolo) [20:00:10] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Services (done): EventGate service runner worker occasionally killed, usually during higher load - https://phabricator.wikimedia.org/T220661 (10Pchelolo) 05Open→03Resolved I believe that this can be closed after we are easily sustaining almost 7k events pe... [20:01:42] 10Analytics, 10Analytics-EventLogging, 10Operations, 10Performance-Team (Radar): Upgrade python-kafka - https://phabricator.wikimedia.org/T221848 (10kchapman) [20:09:36] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Operations, 10Performance-Team (Radar): Upgrade python-kafka - https://phabricator.wikimedia.org/T221848 (10Ottomata) [20:10:18] hi i'm trying to get started accessing the mariadb replicas [20:10:30] I found https://wikitech.wikimedia.org/wiki/Analytics/Data_access#MariaDB_replicas [20:10:39] which is pretty useful! [20:11:08] the one thing I can't figure out so far is where to find .../dblists/all.dblist [20:11:25] in other words, what goes in mw_config_path ? [20:11:48] I'm looking at the snippet in https://wikitech.wikimedia.org/wiki/Analytics/Data_access#MariaDB_replicas [20:18:07] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Operations, 10Performance-Team (Radar): Upgrade python-kafka - https://phabricator.wikimedia.org/T221848 (10Ottomata) Built 1.4.6. .debs and added them to https://apt.wikimedia.org/wikimedia/pool/main/p/python-kafka/. @gilles, can you do the u... [20:21:50] 10Analytics, 10Product-Analytics, 10Epic, 10User-Elukey: Add wikidata ids to data lake tables - https://phabricator.wikimedia.org/T221890 (10Groceryheist) @Nuria yes. My understanding is that they are when pp_propname == "wikibase_item" [20:26:43] nm I found what i'm looking for https://gerrit.wikimedia.org/r/admin/projects/operations/mediawiki-config [20:30:49] I'm still not sure whether I should use_x1 or not [21:59:10] groceryheist: you only want to use_x1 if you want to query data on notifications, Flow, or a few other custom extensions. See https://wikitech.wikimedia.org/wiki/MariaDB#Sections_and_shards—I need to work more on that doc page you linked :) [22:00:45] neilpquinn: yt? [22:00:54] nuria: yup, what's up? [22:01:24] neilpquinn: regarding this ticket: https://phabricator.wikimedia.org/T221338 [22:01:31] neilpquinn: see my comment at the bottom [22:02:27] neilpquinn: I think I am missing something cause i do not see how edits in content pages could be similar to wikistats1 measures but active editor numbers are totally off [22:02:39] nuria: yup, I need to respond—but I suspect the answer is that I am using `page_namespace_is_content_historical` [22:03:01] Rather than `page_namespace_is_content` [22:03:55] nuria: but if you point me to the queries the vetting uses I can give you a better answer :) [22:07:33] a-team: how do I find out what Python packages are available to a YARN PySpark session? It looks like I can use Matplotlib (because importing it works), but is there actually a list somewhere? [22:09:08] thanks neilpquinn [22:09:41] neilpquinn: these are aqs requests rather than selects on top of data lake [22:09:59] nuria: also, I think 10000 editor discrepancy is from 2007 [22:10:27] whereas in more recent months it was 4-5%, which actually seems in line with the results of the vetting [22:10:46] But for our key metric, that much discrepancy is still a significant problem [22:11:13] neilpquinn: ah i see, can you update ticket with this info? you using page_namespace_is_content_historical would make number of editors smaller or larger? [22:11:38] neilpquinn: as in "have namespaces moved OUT of being content namespaces?" [22:12:17] neilpquinn: if so, using historical would render a larger number of editors [22:13:07] sure: will update the ticket. I actually am thinking that `page_namespace_is_content_historical` is not the main issue (my second point is), but it would've decreased the number of editors, because the problem is that a lot of rows have that field null. [22:13:10] neilpquinn: how do you numbers differ from wikistats1? [22:14:13] nuria: I don't remember. I believe I compared them at one point, but it would've been long ago [22:14:59] neilpquinn: so i understand , is it "more correct" to use page_namespace_is_content_historical? or rather page_namespace_is_content [22:16:05] nuria: both are philosophically defensible, but I prefer the first because that means that metrics are immutable. Same reason why I strongly prefer to count edits to deleted pages. [22:20:26] neilpquinn: i see, then let's edit ticket to point out issue with page_namespace_is_content_historical , i see the problem with a page not having ever "chnaged' namespaces, right? in that case page_namespace_is_content_historical might be null but page_namespace_is_content might be true [22:20:32] neilpquinn: is that the case? [22:22:10] nuria: no, that's not the case. also, what do you want me to edit? I already described the specific problems (including the various ones with `page_namespace_is_content_historical`) in the description [22:23:06] nuria: not the case because pages that have never been moved should have and usually do have both fields populated [22:23:28] neilpquinn: just add the reason why you use page_namespace_is_content_historical for metrics [22:23:59] sure [22:24:58] neilpquinn: i do not dispute that it needs correction, but the notion that to calculate metrics this field is the "correct" one is one you are postulating. [22:29:25] neilpquinn: is there a place where teh active editor metric is defined? [22:29:28] nuria: sure, but I actually I think my initial answer was wrong and `page_namespace_is_content_historical` is not the only issue. `page_namespace_is_content` is affected too in a variety of cases, and I think the real issue is that the biggest discrepancies are far in the past, and I am more concerned about a 4% discrepancy in the current metrics [22:29:37] nuria: https://meta.wikimedia.org/wiki/Research:Active_editor [22:30:02] nuria: that was the product of a discussion between Erik Zachte, myself, and Leila some years ago and do says "If feasible, an edit's namespace should be the one that the page was in when the edit was made, whether or not the page was later moved to another namespace." [22:30:38] there's a lot of complexity here :) [22:31:16] but I was able to work around the bug by manually joining to the project_namespace map and bypassing both of the `is_content` flags, so it's a bit less urgent [22:31:23] neilpquinn: ya, i know next to nothing about this , so +1 from me , erik z. did not do it this way and our vetting is done against older wikistats data so it makes sense numbers of active editors will differ if calculated thsi way [22:31:51] 10Analytics, 10Analytics-Data-Quality, 10Analytics-Kanban, 10Product-Analytics: Many revision events in mediawiki_history have missing page and namespace information - https://phabricator.wikimedia.org/T221338 (10Nuria) This is the active editor definition that Neil is using: https://meta.wikimedia.org/wik... [22:31:58] nuria: but I did investigate, and the difference between historical and current namespace is not the source of the issue :) [22:32:35] nuria: it's the fact that these fields are often null—I looked at a lot of individual cases to determine [22:32:57] neilpquinn: which makes me think .. mmmm [22:33:17] neilpquinn: cause "edits" vetting is done only on content namespace [22:33:25] neilpquinn: so something is wrong there [22:34:35] nuria: well, looking at https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats_2/Data_Quality/VettingPerProject#/media/File:Aqs-wikistats-very-active-editors-difference-2017-09.png, eswiki and jawiki were both showing more than a 5% difference in very active editors at the end of the graph period [22:34:59] I don't know what the difference is between, but that's pretty close in magnitude to the recent discrepancy I've seen [22:35:54] nuria: so it actually seems like my findings were fairly consistent with the results of the vetting [22:37:52] neilpquinn: es wiki has _per project_ the highest discrepancy [22:38:01] neilpquinn: but if you are counting active editors total [22:38:35] neilpquinn: enwiki data will have a lot more effect in teh overall numbers [22:38:43] neilpquinn: so tehy shoudl be quite lower [22:38:53] nuria: it's true [22:39:08] neilpquinn: that is why it does not add up, cause when you add "families" [22:40:28] nuria: well, perhaps the takeaway is that we need to put more work into rigorously defining these metrics and making sure all the vetting works the same way and agress [22:40:36] neilpquinn: edit "totals" data is within 1%, let me re-look [22:41:36] nuria: because once the specific problems I identified are fixed, I'm confident in the numbers because they agree closely with the active editor numbers I was calculating from the MediaWiki replicas [22:42:06] neilpquinn: well vetting is done against wikistats1 metrics but regardless i agree, i think issue is that we call "active editors" couple different things [22:42:20] nuria: If you really want to dig into this, you could also look at my discrepancy analysis: https://github.com/wikimedia-research/2019-02-active-editors-discrepancy/blob/master/analysis.ipynb :) [22:43:16] nuria: yeah, I think there's a lot of work we should do under the heading of "metrics consistency". I think Kate would be happy to have me spend more time on it :) [22:43:30] neilpquinn: that link on ticket woudl also be good [22:43:33] *would [22:43:58] nuria: it's linked in https://phabricator.wikimedia.org/T218819, which is linked in the ticket ;) [22:44:35] neilpquinn: ah i see [23:07:36] 10Analytics: Mediawiki-history release - Backlog - https://phabricator.wikimedia.org/T221828 (10Neil_P._Quinn_WMF) I feel like this would be better served by a "mediawiki_history" or "Analytics-mediawiki-history" tag :) [23:08:00] 10Analytics, 10EventBus, 10Operations, 10serviceops, and 4 others: Enabling api-request eventgate to group1 caused minor service disruptions - https://phabricator.wikimedia.org/T218255 (10mobrovac)