[09:29:48] (03PS42) 10Joal: [WIP] Refactor Mediawiki History scala code [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/301837 (https://phabricator.wikimedia.org/T141548) [09:37:32] (03PS34) 10Joal: [WIP] Join and denormalize all histories into one [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/307903 (owner: 10Milimetric) [09:40:05] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Join and denormalize all histories into one [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/307903 (owner: 10Milimetric) [09:40:20] Arf ... Mistake - sorry [09:41:02] (03PS35) 10Joal: [WIP] Join and denormalize all histories into one [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/307903 (owner: 10Milimetric) [09:47:25] (03CR) 10Joal: [C: 04-1] "One line change not needed (comment inline), rest is good !" (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/323249 (https://phabricator.wikimedia.org/T150990) (owner: 10Nuria) [10:01:22] (03PS36) 10Joal: [WIP] Join and denormalize all histories into one [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/307903 (owner: 10Milimetric) [10:59:43] joal: o/ [11:00:05] do you have a minute for a webreq_stat related question? [11:45:29] Hi elukey [11:45:38] Sorry I was on the phone [11:45:43] I have time yes :) [11:47:44] no worries :) [11:47:58] so I was checking the new values after the refinery deployment [11:48:22] and I realized that I didn't know the difference between webrequest_stats and webrequest_stats_hourly [11:48:27] the former contains also hostnames [11:48:30] that are useful [11:48:46] but count_different and percent_different in webrequest_stats returns strange value [11:48:51] *values [11:50:03] for example: I checked today and yesterday [11:51:58] Hey joal! I saw https://gerrit.wikimedia.org/r/#/c/305989/ is merged! :) When is it set to 'go live' ? [11:52:48] like percent_different set to -86% [11:52:54] that doesn't make sense to me [11:53:03] so I wanted to ask the difference between the tables [11:53:12] (I haven't check the Hive sql yet) [11:58:30] So elukey, difference is usage mostly [11:59:56] webrequest_stats_hourly is a summarised view allowing or alerts [12:00:16] While webrequest_stats is the detailed set of stats [12:01:04] elukey: As for the stange values in percent different, I'm going to investigate [12:01:16] let me give you my query [12:01:21] maybe I am checking something wrong [12:03:13] Hi addshore, for your patch to go live we need to deploy [12:03:35] addshore: There is no deploy planned as of now, but we can plan one :) [12:03:54] Great (just so I know when to look for the data appearing)! [12:04:17] addshore: I'll try to remember to ping when we deploy :) [12:05:32] joal: select hour,hostname,count_incomplete,count_different,percent_different from webrequest_sequence_stats where webrequest_source = 'text' and year = 2016 and month = 11 and day = 24 and (count_different != 0 OR count_incomplete != 0) order by hour limit 1000; [12:05:54] I get weird records like [12:05:55] 5 cp1055.eqiad.wmnet 0 36237533 -88.42849283870784 [12:08:35] elukey: Haaa :) [12:08:56] elukey: this is a classical issue, solved in the hourly table at extraction time [12:09:16] elukey: I ran that: select * from webrequest_sequence_stats where webrequest_source = 'text' and year = 2016 and month = 11 and day = 24 AND hour = 5 and hostname = 'cp1055.eqiad.wmnet'; [12:10:51] mmmm [12:11:14] First column: webrequest_sequence_stats.sequence_min is 0 [12:12:10] This means this cache instance of VarnishKafka has been restarted during that hour [12:12:22] And it fakes huge data loss [12:16:40] elukey: makes sense? [12:20:26] Great! [12:24:26] joal: yes it does thanks, I am checking https://grafana.wikimedia.org/dashboard/db/varnishkafka?from=now-72h&to=now and vk on 1055 has been flapping a lot (not sure why, investigating). But why -88.42849283870784 ? [12:24:45] 10Analytics-Tech-community-metrics: No korma data updates for last two weeks - https://phabricator.wikimedia.org/T151503#2819464 (10Lcanasdiaz) a:03Lcanasdiaz [12:25:03] elukey: This is related to how we compute the lost percent [12:26:10] okok thanks :) [12:26:16] now I am going to check cp1055 [12:26:49] elukey: to compute the percent lost, we use seq_max - seq_min as expected number of lines [12:27:04] Obvisouly with min = 0, this expected number is wrong [12:27:34] elukey: To prevent this to appear in aggregated hourly stats, we sum the percent_loos where seq_min != 0 [12:27:40] good? [12:32:57] ahhhhhh okok [12:32:59] thanks :) [12:33:04] no prob :) [12:33:31] speaking of kafka, I believe that we could launch preferred replica election to re-add 1022 [12:33:45] I trust you for that :) [12:33:57] but today is thanksgiving so I am not sure if I can [12:34:02] asking to the other folks [12:34:05] sure [12:34:39] I mean thanksgiving is about turkey and all, not sure there is anything related to western europe writers :) [12:35:15] yes I think I'll proceed, it was only because on US time nobody is expected to work and things shouldn't blow up :P) [12:35:29] I knew, just making fun :) [12:36:12] :P [12:40:14] all good! [12:40:53] Awesome, thanks elukey :) [12:48:39] this is weird https://grafana.wikimedia.org/dashboard/db/varnishkafka?from=now-7d&to=now&panelId=13&fullscreen [12:48:46] some vks seems flapping [12:48:54] and from the logs I can see Log re-acquired [12:49:33] elukey: I think I don't know what that means :( [12:54:42] ah sorry it is what vk and other daemons do when the Varnish shm log is abandoned [12:54:48] maybe for a Varnish restart [13:17:34] * elukey lunch! [13:27:41] (03PS43) 10Joal: [WIP] Refactor Mediawiki History scala code [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/301837 (https://phabricator.wikimedia.org/T141548) [13:28:48] (03PS37) 10Joal: [WIP] Join and denormalize all histories into one [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/307903 (owner: 10Milimetric) [13:41:12] 10Analytics-Tech-community-metrics: Provide equivalent of korma's "Gerrit review queue" (listing the 'worst' SCR repositories) in Kibana - https://phabricator.wikimedia.org/T151488#2820907 (10Aklapper) [13:48:47] (03PS2) 10Joal: [WIP] Update edit history to GraphFrames [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/322260 [14:02:28] 10Analytics-Tech-community-metrics: Provide equivalent of "SCR: Open changesets vs. Open changesets waiting for review (CR0 / CR+1)" in Kibana - https://phabricator.wikimedia.org/T151555#2821002 (10Aklapper) [14:02:31] 10Analytics-Tech-community-metrics: Provide equivalent of "SCR: Distribution of open changesets (by date of submission)" in Kibana - https://phabricator.wikimedia.org/T151556#2821015 (10Aklapper) [14:02:34] 10Analytics-Tech-community-metrics: Provide equivalent of "SCR: Age of open changesets (monthly snapshots)" in Kibana - https://phabricator.wikimedia.org/T151557#2821028 (10Aklapper) [14:02:37] 10Analytics-Tech-community-metrics: Provide equivalent of "SCR: Code review users vs. Code review committers" in Kibana - https://phabricator.wikimedia.org/T151558#2821041 (10Aklapper) [14:02:40] 10Analytics-Tech-community-metrics: Provide equivalent of "SCR: People uploading patchsets vs. Reviewers per month" in Kibana - https://phabricator.wikimedia.org/T151559#2821054 (10Aklapper) [14:02:42] 10Analytics-Tech-community-metrics: Provide equivalent of "SCR: Oldest open Gerrit changesets without code review" in Kibana - https://phabricator.wikimedia.org/T151560#2821067 (10Aklapper) [14:09:09] 10Analytics-Tech-community-metrics, 06Developer-Relations (Oct-Dec-2016): Identify Wikimedia's most important/used info panels in korma.wmflabs.org - https://phabricator.wikimedia.org/T132421#2821086 (10Aklapper) [14:11:19] 10Analytics-Tech-community-metrics, 06Developer-Relations (Oct-Dec-2016): Identify Wikimedia's most important/used info panels in korma.wmflabs.org - https://phabricator.wikimedia.org/T132421#2821121 (10Aklapper) 05Open>03stalled Created dedicated tasks for regressions and linked them from this task's desc... [14:14:11] 10Analytics-Tech-community-metrics: Provide equivalent of "SCR: People uploading patchsets vs. Reviewers per month" in Kibana - https://phabricator.wikimedia.org/T151559#2821147 (10Aklapper) [14:14:13] 10Analytics-Tech-community-metrics: Provide equivalent of "SCR: Age of open changesets (monthly snapshots)" in Kibana - https://phabricator.wikimedia.org/T151557#2821149 (10Aklapper) [14:14:15] 10Analytics-Tech-community-metrics: Provide equivalent of "SCR: Code review users vs. Code review committers" in Kibana - https://phabricator.wikimedia.org/T151558#2821148 (10Aklapper) [14:14:17] 10Analytics-Tech-community-metrics, 06Developer-Relations, 07Epic: Complete migration to new Bitergia's development dashboard (and then kill korma.wmflabs.org) - https://phabricator.wikimedia.org/T137997#2821146 (10Aklapper) [14:15:28] 10Analytics-Tech-community-metrics, 06Developer-Relations, 07Epic: Complete migration to new Bitergia's development dashboard (and then kill korma.wmflabs.org) - https://phabricator.wikimedia.org/T137997#2386336 (10Aklapper) [14:26:26] joal: https://phabricator.wikimedia.org/T151563 - one issue is v [14:26:28] https://phabricator.wikimedia.org/T151563 [14:28:02] wow elukey, very interesting ! [14:44:17] hi joal :] [14:44:20] starting now [14:44:24] Hi mforns [14:44:47] Currently rerunning full job with corrected things [14:47:47] joal, I'm looking at the 2 changes you commited to https://gerrit.wikimedia.org/r/#/c/301837 [14:48:16] k mforns [14:52:15] 10Analytics-Tech-community-metrics: No korma data updates for last two weeks - https://phabricator.wikimedia.org/T151503#2821229 (10Aklapper) [14:59:05] mforns: The null pageIds is confirmed solved [14:59:16] joal, o/ [14:59:30] mforns: The userGroups from table, it's a bit weird [14:59:41] mmmm [14:59:50] mforns: do we take a minute to discuss that? [14:59:56] joal, sure [14:59:58] batcave! [15:00:26] Yay [15:15:34] 10Analytics-Tech-community-metrics, 07Regression: Git repo blacklist config not applied on wikimedia.biterg.io - https://phabricator.wikimedia.org/T146135#2821271 (10Aklapper) [15:26:57] 06Analytics-Kanban, 06Operations, 10Traffic: Varnishlog with Start timestamp but no Resp one causing data consistency check alarms - https://phabricator.wikimedia.org/T148412#2821292 (10ema) 05Open>03Resolved >>! In T148412#2781980, @elukey wrote: > Last but not the least, no alarms were fired for uploa... [15:36:28] (03PS44) 10Joal: [WIP] Refactor Mediawiki History scala code [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/301837 (https://phabricator.wikimedia.org/T141548) [15:36:33] mforns: if you don't mind double checking --^ [15:36:45] joal, sure [15:42:13] joal, looks good! thanks :] [15:42:21] Cool :) [15:42:41] mforns: Will update my patches, and then rerun on data - will be easier for you to vet [15:42:59] joal, OK [15:43:38] Thanks mforns :) [15:44:03] joal, thank you! [15:44:34] (03PS38) 10Joal: [WIP] Join and denormalize all histories into one [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/307903 (owner: 10Milimetric) [15:49:58] 10Analytics-Tech-community-metrics: Port MediaWikiAnalysis to SQLAlchemy - https://phabricator.wikimedia.org/T114437#2821343 (10Aklapper) [15:50:34] 10Analytics-Tech-community-metrics: Port MediaWikiAnalysis to SQLAlchemy - https://phabricator.wikimedia.org/T114437#1695368 (10Aklapper) a:05Anmolkalia>03None @Anmolkalia: I am resetting the assignee of this task because there have not been signs of progress lately (please correct me if I'm wrong). Resetti... [15:51:18] Rerunning simplewiki data mforns :) [15:51:38] awesome :] [15:51:43] this should take minutes [15:52:47] people do we want to do standup? [15:52:55] otherwise we can skip [15:53:31] elukey: Funny you ask, we were saying with mforns that we would show up in case you do :D [15:53:38] ahhahaah [15:54:05] hehe [15:54:11] we can skip if you don't have anything super important to share, otherwise we can meet [15:54:27] I basically worked the whole day to varnishkafka [15:54:41] I feel bored when I listen to myself [15:54:46] it seems that I do that all the time :P [15:54:47] elukey: moving forward on scala (actually getting closer and closer to satisfaction I think <- mforns ? [15:54:49] what a slacker [15:55:03] hehehe [15:55:03] (slacker referred to me :) [15:55:19] elukey: Having worked months on scala for history, I share the feeling ! [15:56:15] you also have been listening to me during these months! Hard work [15:56:16] :P [15:56:23] huhuhu :) [15:56:34] xD [15:57:29] (03PS3) 10Joal: [WIP] Update edit history to GraphFrames [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/322260 [15:58:16] Looks like we don't really need standup [15:58:32] mforns: new data, same place :) [15:58:52] joal, elukey, my standup: yesterday night I was working on a presentation that I want to do at a local meetup, today I will re-vet the edit data after latest changes and read about KafkaSSE [15:59:05] joal, thanks! [15:59:12] cool mforns :) [15:59:27] mforns: nice! Can I ask what local meetup? [15:59:42] elukey, python-data (palma de mallorca) [16:00:06] nice :) [16:00:23] Thanks great mforns :) [16:00:24] elukey, it's actually pyData: [16:00:24] https://www.meetup.com/PyData-Mallorca/ [16:05:35] 10Analytics-Tech-community-metrics, 07Performance: Improve performance of MediaWikiAnalysis - https://phabricator.wikimedia.org/T114439#2821364 (10Aklapper) [16:06:41] 10Analytics-Tech-community-metrics: In MediaWikiAnalysis, implement some missing information from the MediaWiki API - https://phabricator.wikimedia.org/T114440#2821365 (10Aklapper) [16:07:16] weird result for bots mforns :( [16:07:20] batcave quickly? [16:07:24] joal, sure [16:08:17] 10Analytics-Tech-community-metrics: In MediaWikiAnalysis, implement some missing information from the MediaWiki API - https://phabricator.wikimedia.org/T114440#1695405 (10Aklapper) This task nowadays feels pretty vague to me. It's likely a good task for someone about to dive into the codebase anyway, but apart f... [16:09:39] 10Analytics-Tech-community-metrics: Make MediaWikiAnalysis not index IP address accounts (and consider removing MediaWiki accounts from our DB) - https://phabricator.wikimedia.org/T149482#2821368 (10Aklapper) [16:44:18] 10Analytics-Tech-community-metrics, 06Developer-Relations: Who are the top 50 independent contributors and what do they need from the WMF? - https://phabricator.wikimedia.org/T85600#2821463 (10Aklapper) [16:44:20] 10Analytics-Tech-community-metrics, 06Developer-Relations (Oct-Dec-2016): Fix incorrect mailing list activity of AKlapper (=Phabricator) in Technical Community Metrics user data - https://phabricator.wikimedia.org/T132907#2821459 (10Aklapper) 05Open>03Resolved Fixed by https://github.com/Bitergia/mediawiki... [16:49:34] 10Analytics-Tech-community-metrics: Port MediaWikiAnalysis to SQLAlchemy - https://phabricator.wikimedia.org/T114437#2821504 (10Acs) >>! In T114437#2821343, @Aklapper wrote: > @Anmolkalia: I am resetting the assignee of this task because there have not been signs of progress lately (please correct me if I'm wron... [17:25:52] * elukey afk! [17:25:54] o/ [17:27:47] o/ [18:29:33] (03PS45) 10Joal: [WIP] Refactor Mediawiki History scala code [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/301837 (https://phabricator.wikimedia.org/T141548) [18:32:06] (03PS39) 10Joal: [WIP] Join and denormalize all histories into one [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/307903 (owner: 10Milimetric) [18:44:26] (03PS4) 10Joal: [WIP] Update edit history to GraphFrames [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/322260 [19:57:58] 10Analytics-Tech-community-metrics: In MediaWikiAnalysis, implement some missing information from the MediaWiki API - https://phabricator.wikimedia.org/T114440#2821989 (10Qgil) Agreed. [21:07:19] 10Analytics-Tech-community-metrics: Make Perceval or MediaWikiAnalysis not index IP address accounts (and consider removing MediaWiki accounts from our DB) - https://phabricator.wikimedia.org/T149482#2822103 (10Aklapper) [22:21:20] 10Analytics-Tech-community-metrics: In MediaWikiAnalysis, implement some missing information from the MediaWiki API - https://phabricator.wikimedia.org/T114440#2822167 (10jgbarah) I guess we better close this task. Now, data retrieval for MediaWiki is done with a [[ http://perceval.readthedocs.io/en/latest/perce... [23:13:57] 10Analytics-Tech-community-metrics, 07Performance: Improve performance of MediaWikiAnalysis - https://phabricator.wikimedia.org/T114439#2822197 (10Aklapper) [23:14:00] 10Analytics-Tech-community-metrics: In MediaWikiAnalysis, implement some missing information from the MediaWiki API - https://phabricator.wikimedia.org/T114440#2822195 (10Aklapper) 05Open>03declined @jgbarah: Hah, thanks! I was not entirely sure about that. I'll close the remaining MediaWikiAnalysis tasks an... [23:15:51] 10Analytics-Tech-community-metrics: Make Perceval not index IP address accounts (and consider removing MediaWiki accounts from our DB) - https://phabricator.wikimedia.org/T149482#2822202 (10Aklapper) [23:15:54] 10Analytics-Tech-community-metrics, 07Performance: Improve performance of MediaWikiAnalysis - https://phabricator.wikimedia.org/T114439#1695392 (10Aklapper) 05Open>03declined Declining this task as //MediaWikiAnalysis// has been superseded by the [[ https://github.com/grimoirelab/perceval/blob/master/perce... [23:15:57] 10Analytics-Tech-community-metrics: Misc. improvements to MediaWikiAnalysis (which is part of the MetricsGrimoire toolset) - https://phabricator.wikimedia.org/T89135#2822206 (10Aklapper) [23:16:01] 10Analytics-Tech-community-metrics: Port MediaWikiAnalysis to SQLAlchemy - https://phabricator.wikimedia.org/T114437#2822207 (10Aklapper) 05Open>03declined Declining this task as //MediaWikiAnalysis// has been superseded by the [[ https://github.com/grimoirelab/perceval/blob/master/perceval/backends/core/med... [23:16:04] 10Analytics-Tech-community-metrics: Misc. improvements to MediaWikiAnalysis (which is part of the MetricsGrimoire toolset) - https://phabricator.wikimedia.org/T89135#1027979 (10Aklapper)