[00:10:08] Quarry, Labs: Long-running Quarry query (querry?) produces strangely incorrect results - https://phabricator.wikimedia.org/T135087#2552339 (AlexMonk-WMF) [03:18:03] (PS1) Yuvipanda: Use configured redis host for sessions [analytics/quarry/web] - https://gerrit.wikimedia.org/r/304758 [06:00:58] (PS1) Yuvipanda: Set URL prefix the stupid way [analytics/quarry/web] - https://gerrit.wikimedia.org/r/304764 [06:01:06] (CR) jenkins-bot: [V: -1] Set URL prefix the stupid way [analytics/quarry/web] - https://gerrit.wikimedia.org/r/304764 (owner: Yuvipanda) [06:04:29] (PS2) Yuvipanda: Use configured redis host for sessions [analytics/quarry/web] - https://gerrit.wikimedia.org/r/304758 [06:04:32] (PS2) Yuvipanda: Set URL prefix the stupid way [analytics/quarry/web] - https://gerrit.wikimedia.org/r/304764 [06:09:37] (PS3) Yuvipanda: Set URL prefix the stupid way [analytics/quarry/web] - https://gerrit.wikimedia.org/r/304764 [07:41:20] does anybody have a simple script for dealing with the YYYYMMDDHHMMSS format of date/time in the tables? I often would like to convert it to Unix-timestap or so to easily do calculations and then convert it back. Easy to write, I assume, but if it is already there and tested, I'd rather take that :) [07:52:42] …OK, R has most build in. If there is a nice collection of common problem solving scripts I'd take it nevertheless. [11:05:54] taking a break a-team [12:33:43] Analytics: Review the current automatic recommendation mechanism - https://phabricator.wikimedia.org/T142831#2553083 (Physikerwelt) [13:46:30] (PS20) Mforns: [WIP] Refactor Mediawiki History scala code [analytics/refinery/source] - https://gerrit.wikimedia.org/r/301837 (https://phabricator.wikimedia.org/T141548) (owner: Joal) [14:13:36] hey joal if you're around I can show you what problem I was having with sqoop [14:13:45] Hi milimetric :) [14:13:49] I'm here ! [14:14:09] ok, so it's about the password file / options file [14:14:10] milimetric: batcave? [14:14:13] sure [14:40:14] Analytics-Kanban, EventBus, Patch-For-Review: Upgrade kafka main clusters to 0.9 - https://phabricator.wikimedia.org/T138265#2553409 (Ottomata) @Pchelolo we are ready to go for main-eqiad! Ping me when you are online so you can babysit change-prop. [14:46:39] running home for standup... [14:55:20] milimetric: doh! [14:55:22] sorry for the trouble. [14:55:32] the password thing? [14:55:44] no trouble at all, I wasn't getting it to work either [14:55:53] Joseph figured it out just now [14:56:17] milimetric: you got a quoting issue in that patch [14:56:28] use single quotes around ${password} since it is already in double quotes [14:57:03] oh sorry, fixing [15:02:34] Analytics-Kanban: User History: Make it work for enwiki - https://phabricator.wikimedia.org/T142472#2553459 (Nuria) Open>Resolved [15:02:50] Analytics-Kanban: Update Refinery to solve dependencies conflicts - https://phabricator.wikimedia.org/T142717#2553462 (Nuria) Open>Resolved [15:07:35] Analytics-Kanban: Continue New AQS Loading - https://phabricator.wikimedia.org/T140866#2553463 (JAllemandou) For my fellow an-engineers to replace me while I'm on holidays: https://etherpad.wikimedia.org/p/backfilling_aqs [15:07:46] Analytics-Kanban: Wikistats 2.0. Edit Reports: Setting up a pipeline to source Historical Edit Data into hdfs {lama} - https://phabricator.wikimedia.org/T130256#2553465 (Nuria) [15:10:17] Is it know the pageview dumps aren't being updating? [15:10:26] Since the 5th [15:10:51] Analytics-Kanban: User History: Populate the causedByUserId and causedByUserName fields in 'create' states. - https://phabricator.wikimedia.org/T139761#2553466 (Nuria) Open>Resolved [15:10:53] Analytics-Kanban, Patch-For-Review: Extract edit oriented data from MySQL for simplewiki - https://phabricator.wikimedia.org/T134790#2553467 (Nuria) [15:10:53] https://lists.wikimedia.org/pipermail/analytics/2016-August/005339.html [15:11:05] Reedy: --^ [15:11:10] Thanks! [15:11:18] Analytics-Kanban: Scale MySQL edit history reconstruction data extraction - https://phabricator.wikimedia.org/T134791#2553469 (Nuria) [15:11:20] Analytics-Kanban: Scale scala algorithms using graph partitioning - https://phabricator.wikimedia.org/T141548#2553468 (Nuria) Open>Resolved [15:23:41] Analytics, Dumps-Generation, Labs: Put pageviews dataset in labs /public/dumps - https://phabricator.wikimedia.org/T142671#2553492 (Nuria) [15:24:12] Analytics, Labs: Put pageviews dataset in labs /public/dumps - https://phabricator.wikimedia.org/T142671#2553498 (Ottomata) [15:44:59] Analytics, Operations, Ops-Access-Requests: Add analytics team members to group aqs-admins to be able to deploy pageview APi - https://phabricator.wikimedia.org/T142101#2553564 (RobH) Open>stalled The new deploy-aqs group allows for sudo/deployment access for these services. As such, this re... [15:48:18] Analytics, Pageviews-API, Reading-analysis: Suddenly outrageous higher pageviews for main pages - https://phabricator.wikimedia.org/T141506#2553577 (TJones) [15:53:16] Analytics: Substantial amount of ip addresses in 1:1000 sampled squid logs does not resolve into geo data, from Nov 2013 onwards - https://phabricator.wikimedia.org/T90235#1054202 (Nuria) We are about to remove sample logs in favor of new pageview definition pageview counts. This report has the newest data... [15:53:25] Analytics: Substantial amount of ip addresses in 1:1000 sampled squid logs does not resolve into geo data, from Nov 2013 onwards - https://phabricator.wikimedia.org/T90235#2553598 (Nuria) Open>declined [15:55:58] Analytics, Datasets-Webstatscollector: www.f / foundationwiki (wikimediafoundation.org) pageviews low, underreporting? - https://phabricator.wikimedia.org/T51266#562202 (Nuria) This data is now on pageview API: http://tools.wmflabs.org/siteviews/?platform=all-access&source=pageviews&agent=user&range=late... [15:56:05] Analytics, Datasets-Webstatscollector: www.f / foundationwiki (wikimediafoundation.org) pageviews low, underreporting? - https://phabricator.wikimedia.org/T51266#2553610 (Nuria) Open>Resolved [15:56:56] Analytics: Review EL requests from Mediaviewer - https://phabricator.wikimedia.org/T88849#2553629 (Nuria) Open>Resolved [15:58:31] Analytics, Multimedia: Add mediacounts to pageview API - https://phabricator.wikimedia.org/T88775#1020132 (Nuria) Concern here is datasize, we do have counts for media files that could be published to AQS. [16:00:07] Analytics, Fundraising-Backlog, Wikimedia-Fundraising: Public dashboards for CentralNotice and Fundraising - https://phabricator.wikimedia.org/T88744#1019258 (Nuria) Removing analytics from ticket as it doesn't have any actionables for team. [16:06:27] nuria_: google pushed me, logging back in [16:08:33] Analytics, Operations, Ops-Access-Requests: Add analytics team members to group aqs-admins to be able to deploy pageview APi - https://phabricator.wikimedia.org/T142101#2553701 (Ottomata) Add me and Luca as well please. We won't gain any extra sudo permissions, but this group will be used to grant a... [16:10:34] Analytics, Pageviews-API, Reading-analysis: Suddenly outrageous higher pageviews for main pages - https://phabricator.wikimedia.org/T141506#2553703 (debt) I had been trying to understand why the [[https://www.wikipedia.org | wikipedia.org]] portal pageviews had been on the rise recently and @Tbayer p... [16:19:11] (PS21) Joal: [WIP] Refactor Mediawiki History scala code [analytics/refinery/source] - https://gerrit.wikimedia.org/r/301837 (https://phabricator.wikimedia.org/T141548) [16:45:22] Analytics, Pageviews-API, Reading-analysis: Suddenly outrageous higher pageviews for main pages - https://phabricator.wikimedia.org/T141506#2553821 (Tbayer) >>! In T141506#2553703, @debt wrote: > I had been trying to understand why the [[https://www.wikipedia.org | wikipedia.org]] portal pageviews ha... [16:50:57] Analytics, Pageviews-API, Reading-analysis: Suddenly outrageous higher pageviews for main pages - https://phabricator.wikimedia.org/T141506#2553869 (Nuria) @debt: Maybe you want to open another ticket? Make sure to note where are you getting your pageviews from, is it pageview API? [17:01:29] ottomata: soooo, questionnn... [17:01:35] Analytics, Datasets-Webstatscollector: www.f / foundationwiki (wikimediafoundation.org) pageviews low, underreporting? - https://phabricator.wikimedia.org/T51266#2553883 (Nemo_bis) Figures are the same, only some 20 pageviews/day for the main page: https://tools.wmflabs.org/pageviews/?project=wikimediafo... [17:01:41] ottomata: can i ssh from 1002 into 1003 directly? [17:01:47] ottomata: and thsu transfer a file? [17:01:51] *thus [17:01:57] Analytics, Datasets-Webstatscollector, Pageviews-API: www.f / foundationwiki (wikimediafoundation.org) pageviews low, underreporting? - https://phabricator.wikimedia.org/T51266#2553886 (Nemo_bis) [17:01:59] no [17:02:02] nuria_: rsync. [17:02:05] with rsync module [17:02:09] ::srv [17:02:25] https://wikitech.wikimedia.org/wiki/Analytics/FAQ [17:02:48] ottomata: ajajam okeis [17:06:56] brb [17:07:20] Analytics, Datasets-Webstatscollector, Pageviews-API: www.f / foundationwiki (wikimediafoundation.org) pageviews low, underreporting? - https://phabricator.wikimedia.org/T51266#2553927 (Heather) If you look from that page, it's very different: https://tools.wmflabs.org/pageviews/?project=wikimediafou... [17:15:08] Analytics, Datasets-Webstatscollector, Pageviews-API: www.f / foundationwiki (wikimediafoundation.org) pageviews low, underreporting? - https://phabricator.wikimedia.org/T51266#2553963 (Nuria) Nemo_bis: I doubt there is an issue . The wikimediafoundation redirects to "home" rather than main page upon... [17:19:49] Analytics, Datasets-Webstatscollector, Pageviews-API: www.f / foundationwiki (wikimediafoundation.org) pageviews low, underreporting? - https://phabricator.wikimedia.org/T51266#2553972 (Nemo_bis) Good point, thanks for correcting me. [17:44:23] Analytics, Datasets-Webstatscollector, Pageviews-API: www.f / foundationwiki (wikimediafoundation.org) pageviews low, underreporting? - https://phabricator.wikimedia.org/T51266#2554053 (Nuria) Sorry ping @Nemo_bis again, see prior comment [17:44:46] Analytics, Datasets-Webstatscollector, Pageviews-API: www.f / foundationwiki (wikimediafoundation.org) pageviews low, underreporting? - https://phabricator.wikimedia.org/T51266#2554055 (Nuria) Closing ticket. [17:54:14] logging a-team, will see you folks in 2 weeks ! [17:54:27] laters joal, will keep an aye on compactions and loading [17:54:33] thanks :) [17:54:48] have an awesome time! [17:54:54] an aye?! [17:54:55] eye* [17:54:56] haha [18:11:30] ottomata: i think i broke something [18:11:56] nuria_: sorry, just started kafka upgradd! [18:12:09] ottomata: k, np i broke something real small i think [18:43:27] ottomata: let me know when you are available [18:45:56] ottomata: hey ja go ahead [18:56:45] nuria_: ^ [18:56:51] ottomata: yessir [18:57:16] ottomata: soo I think i deleted limn-public-data from 1003/srv [18:57:26] ottomata: it was a symlink to somewhere but where? [18:57:41] Analytics, Pageviews-API, Reading-analysis: Suddenly outrageous higher pageviews for main pages - https://phabricator.wikimedia.org/T141506#2554503 (mpopov) >>! In T141506#2553869, @Nuria wrote: > @debt: Maybe you want to open another ticket? Make sure to note where are you getting your pageviews fro... [18:59:26] hm, nuria_ not sure about that [18:59:33] milimetric: ^^^ [19:00:11] ottomata: can we back up whatever is on /a/limn-public-data on 1001 in case ahem it gets deleted by rsync? [19:00:29] uh [19:00:52] i'm trying to remember [19:04:39] nuria_, milimetric, I think the latest puppet code uses /a/reportupdater/output, so I guess /srv/limn-public-data was pointing there [19:05:00] also, I think rsync is not in delete mode, so no problem [19:05:28] mforns: in meeting [19:05:32] mforns: give me a sec [19:05:32] ok [19:06:41] Analytics, Pageviews-API, Reading-analysis: Suddenly outrageous higher pageviews for main pages - https://phabricator.wikimedia.org/T141506#2554629 (debt) @Tbayer - the portal showed a spike to about 40.5Million pageviews on July 20th, so we felt it there a bit too, but ours did start in June. [19:07:15] nuria_: backin gup [19:07:19] yes, i see it is still on stat1001 [19:08:17] I'd say rsync is watchin /a/reportupdater/output [19:08:29] ja, confirmed [19:08:43] we changed that in puppet, /srv/limn-public-data was only left there for backwards compat [19:08:46] nuria_: all is well, i see /srv/reportupdater/output on stat1003 and also /var/www/limn-public-data on stat1002 [19:08:58] thanks mforns i did not remember that bit [19:09:28] thanks for confirming :] [19:16:54] mforns: ahahahah [19:17:06] :] [19:17:24] mforns: see, that is why i do not want permits to nothing [19:17:28] permits are stressful [19:17:31] hehe [19:17:46] I would still restore /a/limn-public-data from 1003, right? [19:17:50] *from 1001 [19:18:01] milimetric: it was a symlink [19:18:30] milimetric: to /srv/reportupdater/output [19:18:31] no, I think /srv/limn-public-data was the symlink [19:18:51] yes yes, that is what i tried to say [19:19:03] I think there's other stuff that was writing to /a/limn-public-data besides reportupdater, no? [19:19:10] cron jobs that Dario was running? [19:19:17] or did all those get migrated? [19:19:38] has something changed with /a/limn-public-data? [19:19:50] it was deleted [19:19:58] I thought the only thing changed was /srv/limn-public-data [19:20:00] on 1003 i deleted the symlink [19:20:11] yes, on /srv/limn-public-data [19:20:20] no, but /a/limn-public-data is deleted too [19:20:33] oh :) [19:20:36] it must have deleted the target too [19:20:41] oh /a is a symlink to /srv :) [19:20:47] aaaah ok [19:20:49] but teh stuff that there was under is on /srv/reportupdater/output/ [19:20:57] mforns: no content was deleted [19:20:59] yes, ok [19:21:05] they were both symlinks :) [19:21:10] ok ok [19:21:27] because /srv/limn-public-data pointed to /srv/reportupdater/output and /a pointed to /srv [19:21:47] I think all is well, and maybe less confusing :) [19:21:48] milimetric, makes sense [19:22:20] mforns: so no need to recreate symlinks then? [19:22:25] so the only thing to do is recreate /srv/limn-public-data to /srv/reportupdater/output no? [19:29:11] mforns: but do we need it? [19:29:20] nuria_, not sure [19:29:38] mforns: i guess we might to later rsync stuff to 1001 [19:29:54] mmm [19:33:07] Analytics, Pageviews-API, Reading-analysis: Suddenly outrageous higher pageviews for main pages - https://phabricator.wikimedia.org/T141506#2554790 (Nuria) @debt: > Since the pageviews UDF doesn't apply to the wikipedia.org portal page I assume you are talking about this page: https://www.wikipedia.... [19:33:40] mforns: ok, recreated link [19:33:59] nuria_, all our reports from reportupdater do not use limn-public-data path currently [19:34:00] https://github.com/wikimedia/operations-puppet/blob/production/manifests/role/statistics.pp#L82 [19:34:05] mforns: in srv [19:34:20] mforns: but i think the http endpoint does: [19:34:31] but someone might be using that path to rsync something else? I don't know [19:34:40] mforns: yes, https://datasets.wikimedia.org/limn-public-data/ [19:34:51] aha [19:35:50] k, i think all is well [19:35:55] thanks mforns , milimetric [19:36:04] np :] [19:37:06] nuria_: maybe chown it to the stats user in case people are hitting that symlink from their crons [19:39:45] milimetric: i think i need sudo to do that [19:40:13] oh yeah! sorry, 'cause you can sudo -u stats but that doesn't help [19:40:33] ottomata: would you mind sudo chown stats /a/limn-public-data on stat1003? [19:41:28] yeah that /srv/ vs /a thing is always very confusing [19:41:30] its a historical problem [19:43:54] milimetric: not stats, but root, symlink perms are weird [19:43:56] but, ja, done. [19:44:33] oh ok, thx [19:57:23] nuria_: added you as reviewer here [19:57:24] https://gerrit.wikimedia.org/r/#/c/304879 [19:57:35] just a little script to help with replaying events captured in EventError streams [19:57:50] ottomata: k, will look [19:58:35] ottomata: and event error streams are reda from kafka? [19:58:38] *are read [19:58:39] whatever [19:58:40] doesn't matter [19:58:47] as long as there are only EventError events in the stream [19:58:57] i just replayed failed events from the log files on the eventbus hosts [19:59:06] eventbus did drop a few events during broker restarts :( [19:59:09] but we captured them in local files [19:59:19] so I just replayed them directly from the files back into kafka using this script [19:59:36] ottomata: nice [20:18:42] Analytics-Kanban, EventBus, Patch-For-Review: Upgrade kafka main clusters to 0.9 - https://phabricator.wikimedia.org/T138265#2395136 (Ottomata) Done. eventlogging-service-eventbus did drop a few events during broker restarts. GAHHHHH. We captured these events in a file and then replayed them manua... [20:28:04] ottomata: looks good , should i merge? [20:31:08] Hm, I scheduled a few public reports and they were all marked SUCCESS instantly but there is actually no output https://metrics.wmflabs.org/static/public/5690654/full_report.json [20:36:37] nuria_: sure! [20:40:25] bye a-team! see you tomorrow :] [20:40:30] nite dude [20:40:30] bye! [20:40:46] the dude :] [20:43:13] (PS4) Nuria: Bug fixes on datepicker [analytics/dashiki] - https://gerrit.wikimedia.org/r/303693 (https://phabricator.wikimedia.org/T141165) [21:06:34] (PS13) Milimetric: [WIP] Oozify sqoop import of mediawiki tables [analytics/refinery] - https://gerrit.wikimedia.org/r/303339 (https://phabricator.wikimedia.org/T141476) [21:26:18] nuria_: , Pchelolo, added EventBus replay documentation: [21:26:19] https://wikitech.wikimedia.org/wiki/EventBus/Administration#Replaying_failed_events [21:26:25] note that we haven't actually deployed the eventlogging-replay script [21:26:29] but once we do that's how to do it [21:28:11] ottomata: got a sec for a yarn / oozie question? [21:28:22] I finally got the other details sorted out and now I'm getting: [21:28:24] https://hue.wikimedia.org/jobbrowser/jobs/job_1468526822215_95233/single_logs [21:28:34] "Encountered IOException running import job: java.io.IOException: The ownership on the staging directory /user/yarn/.staging is not as expected. It is owned by hdfs. The directory must be owned by the submitter yarn or by yarn" [21:28:51] I tried submitting the job as hdfs but didn't seem to work [21:29:08] hm. [21:30:10] HMMMMM [21:31:28] milimetric: I am going to set /user/yarn ownership to the yarn user, that is a little funky [21:31:46] ok [21:32:06] (looks like maybe I have other errors too, so this might have happened before too) [21:33:16] ok milimetric done, try again [21:34:08] k (gotta fix some other stuff) [21:34:38] k, milimetric i'm around but im' getting up from compy [21:36:05] k, don't stick around for my sake, I've plenty to do it looks like :/ [22:13:04] (PS14) Milimetric: [WIP] Oozify sqoop import of mediawiki tables [analytics/refinery] - https://gerrit.wikimedia.org/r/303339 (https://phabricator.wikimedia.org/T141476) [22:13:34] I'm giving up for today, but Andrew if you read this that issue is gone [22:13:56] I still can't get oozie to run sqoop that works perfectly fine outside of sqoop, and I'm running out of ideas [22:14:02] but I'll try again tomorrow :) [22:14:04] nite al [22:14:05] *all [22:16:43] milimetric: maybe we can find time and bang it out together this week [22:16:46] nighters! [22:17:19] sure, if contractor stuff calms down here I can come over one day [22:27:57] Analytics-Dashiki, Analytics-Kanban: Default date selection to currently applied date for browser reports - https://phabricator.wikimedia.org/T141165#2555498 (Nuria) a:Nuria [22:32:01] Analytics-Kanban: Compile a request data set for caching research and tuning - https://phabricator.wikimedia.org/T128132#2555506 (Nuria) I have put a "1 day" dataset here: https://datasets.wikimedia.org/limn-public-data/caching/ Please take a look, if it looks good we can provide 2 weeks worthy of files. [22:52:03] Analytics, Beta-Cluster-Infrastructure, Services, scap, and 3 others: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#1743135 (AlexMonk-WMF) Puppet on this host is broken: Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find data item analytics_... [23:01:08] Analytics, Pageviews-API, Reading-analysis: Suddenly outrageous higher pageviews for main pages - https://phabricator.wikimedia.org/T141506#2555719 (debt) Thanks, @Nuria - I've opened this ticket T143064 to update our eventlogging schema for the Wikipedia portal page (https://www.wikipedia.org) [23:14:09] Analytics-EventLogging: Investigate duplicate EventLogging rows - https://phabricator.wikimedia.org/T142667#2555765 (JKatzWMF) @Nuria curious to hear your take on this, as it is has a fairly significant in some areas