[06:34:38] 10Analytics-EventLogging, 06Analytics-Kanban, 10DBA, 10ImageMetrics, 13Patch-For-Review: Drop EventLogging tables for ImageMetricsLoadingTime and ImageMetricsCorsSupport - https://phabricator.wikimedia.org/T141407#3132872 (10Marostegui) So far so good, but let's give it a couple of more days: ``` root@EV... [07:01:55] 06Analytics-Kanban, 06Operations, 06WMDE-Analytics-Engineering, 13Patch-For-Review, 15User-Addshore: /a/mw-log/archive/api on stat1002 no longer being populated - https://phabricator.wikimedia.org/T160888#3132915 (10elukey) Removed `/srv/mw-log/archive/api_log_backup_elukey/*` from mwlog1001 and verified... [07:34:11] !log re-run mediacounts-load-wf-2017-3-24-14 from hue [07:34:12] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:35:38] ah just re-failed! [07:35:45] hello oozie, always a pleasure [07:48:41] (03PS1) 10Mforns: Fix domain_abbrev_map job to disambiguate wikimedia projects [analytics/refinery] - 10https://gerrit.wikimedia.org/r/344914 (https://phabricator.wikimedia.org/T156388) [07:50:15] org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER [07:51:48] brb [08:45:57] 06Analytics-Kanban, 06Operations, 06WMDE-Analytics-Engineering, 13Patch-For-Review, 15User-Addshore: /a/mw-log/archive/api on stat1002 no longer being populated - https://phabricator.wikimedia.org/T160888#3133065 (10Addshore) >>! In T160888#3132915, @elukey wrote: > @Addshore is everything ok from your s... [08:46:58] hi team :] [08:47:20] o/ [08:47:46] hey elukey how was vacation :] ? [08:48:58] goood :) [08:49:36] Scotland is very windy and cold, but really nice.. I haven't seen the north of Scotland though, next trip [08:53:56] elukey, :] [08:55:02] joal, yt? [08:58:29] Hi mforns [08:58:33] Hi elukey :) [08:58:42] hello joal :] how are you? [08:58:48] I'm good ! [08:58:54] How about you mforns ? [08:59:05] I'm good-ish :] [08:59:13] do you want to sync-up in da cave? [08:59:38] sure [09:08:42] elukey, can you join us in the cave? :] [09:09:37] mforns: sure, 2 mins [09:09:42] np [09:24:12] (03PS1) 10Mforns: Temporary remove monitoring for double deploy [analytics/aqs] - 10https://gerrit.wikimedia.org/r/344918 (https://phabricator.wikimedia.org/T156391) [09:24:36] (03CR) 10Mforns: [V: 032 C: 032] Temporary remove monitoring for double deploy [analytics/aqs] - 10https://gerrit.wikimedia.org/r/344918 (https://phabricator.wikimedia.org/T156391) (owner: 10Mforns) [09:30:21] (03PS1) 10Mforns: Update aqs to 12e1dbf [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/344919 [09:30:41] (03CR) 10Mforns: [V: 032 C: 032] Update aqs to 12e1dbf [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/344919 (owner: 10Mforns) [09:43:41] (03PS1) 10Mforns: Restoring monitoring for pagecounts endpoint [analytics/aqs] - 10https://gerrit.wikimedia.org/r/344921 [09:44:03] (03CR) 10Mforns: [V: 032 C: 032] Restoring monitoring for pagecounts endpoint [analytics/aqs] - 10https://gerrit.wikimedia.org/r/344921 (owner: 10Mforns) [09:50:28] (03PS1) 10Mforns: Update aqs to b25e965 [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/344923 [09:50:43] (03CR) 10Mforns: [V: 032 C: 032] Update aqs to b25e965 [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/344923 (owner: 10Mforns) [09:56:14] 10Analytics, 06Operations, 06Reading-Web-Backlog, 10Traffic: mobile-safari has very few internally-referred pageviews - https://phabricator.wikimedia.org/T148780#2732531 (10Nemo_bis) >>! In T148780#2891117, @mforns wrote: > Until today, there are certain browser versions that are not populating the referre... [10:14:20] elukey: lucaaa I missed you! [10:15:16] * elukey sends some wikilove to fdans [10:15:58] awyissss sum wikiloooove [10:18:46] 10Analytics: Add mobile-site to AQS legacy pagecounts metric - https://phabricator.wikimedia.org/T161494#3133296 (10mforns) [10:20:39] 10Analytics: Add AQS's new pagecounts endpoint to mediawiki-services-restbase - https://phabricator.wikimedia.org/T161495#3133302 (10mforns) [10:47:03] elukey: do we spend a minute discussing analytics1044? [10:50:28] joal: sure [10:50:36] elukey: cave? [10:50:41] joining [11:16:17] * elukey lunch! [11:37:21] !log Start manual sqoop for failed wikis (dawiki, cebwiki, srwiki) [11:37:23] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:49:34] taking a break a-team [11:50:46] milimetric: hi! mind batcaving with me for a couple minutes when you get here? :) [13:03:35] !log fixed permissions (hdfs:hdfs -> root:root for /var/lib/hadoop/data) [13:03:36] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:03:44] hiii [13:04:01] 10Analytics, 10Analytics-Cluster, 06Operations, 10hardware-requests: EQIAD: stat1002 replacement - https://phabricator.wikimedia.org/T159838#3133807 (10faidon) a:05Ottomata>03RobH [13:13:31] hiiiiii [13:35:54] 10Analytics-Tech-community-metrics, 06Developer-Relations (Jan-Mar-2017): Deployment of Maniphest panel - https://phabricator.wikimedia.org/T138002#2386415 (10Albertinisg) @Aklapper we've updated the dashboard today adding finally the information of Maniphest. You can check it in: https://wikimedia.biterg.io/... [13:53:08] joal: adding some weirdness [13:53:16] elukey@analytics1044:/var/log/hadoop-yarn$ grep -i DiskErrorException yarn-yarn-nodemanager-analytics1044.log | cut -d " " -f 5 | cut -d "/" -f1,2,3,4 | cut -d "_" -f1,2 | sort | uniq -c | sort -n -k 1 [13:53:20] 3 usercache/fdans/appcache/application_1488294419903 [13:53:22] 16 usercache/druid/appcache/application_1488294419903 [13:53:25] 19 usercache/analytics-search/appcache/application_1488294419903 [13:53:28] 20 usercache/nuria/appcache/application_1488294419903 [13:53:30] 57 usercache/ebernhardson/appcache/application_1488294419903 [13:53:33] 128 usercache/elukey/appcache/application_1488294419903 [13:53:35] 131 usercache/jdcc/appcache/application_1488294419903 [13:53:38] 405 usercache/bd808/appcache/application_1488294419903 [13:53:40] 2261 usercache/bearloga/appcache/application_1488294419903 [13:53:43] 5521 usercache/hdfs/appcache/application_1488294419903 [13:53:44] for some reason, application_1488294419903 seems to be in all the errors [13:54:49] elukey: i'm not totally sure how application ids get assigned, but isn't that not a full id? [13:54:56] i could be wrong [13:55:16] but i kinda thought it was something like: _ [13:55:27] ahhhh that would make sense yes [13:55:30] so the first number would be all the same for all jobs for a while [13:55:36] i am about 75% sure about that ^ [13:55:38] i could be very wrong [13:55:45] super, thanks for the explanation :) [13:56:00] I have officially no idea about what's happening then :) [13:56:06] (on analytics1044) [13:56:33] yeah [13:56:37] me neither, its really strange! [13:57:11] elukey: for this one, you reinstalled without wiping data disks, right? [13:57:42] yep.. only 1040 was wiped [13:58:15] hm [13:58:43] elukey: do we have the paths to the files that caused the error? [13:58:48] the error is something like file is missing, right? [13:59:06] maybe all the errors happen for files on the same disk? [13:59:15] it seems so.. I checked /var/lib/hadoop/data/X/yarn/local's perms (that are configured as yarn.nodemanager.local-dirs), nothing stands out [13:59:17] i think not, right? the errors just print relative paths? [13:59:47] yeah it says: "didn't find anything on all the yarn.nodemanager.local-dirs" [14:00:37] I thought it could have been a debian issue but it happens only on 1044 [14:06:16] !log restart hadoop-yarn-nodemanager on analytics1044 [14:06:17] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:06:32] worth a try! :) [14:07:41] ah yes at this point I am doing the "turn off/on again" :) [14:09:15] luca's hammer [14:10:58] ottomata: wdyt about switching hue.w.o to thorium? [14:12:04] les do it [14:13:56] 06Analytics-Kanban: Kill limn1 - https://phabricator.wikimedia.org/T146308#2656945 (10chasemp) @Milimetric @Nuria I propose we shut this down today or tomorrow just to make sure you guys don't figure out you need something from it before the 3/31 deadline :) (It's currently 1 of 3 remaining) [14:14:31] (03CR) 10Ottomata: "re table var names, sounds ok to me! Just make sure that there is a historical/README file, or at least some comments in workflow.xml tha" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/339421 (https://phabricator.wikimedia.org/T156388) (owner: 10Mforns) [14:19:05] ottomata: merging! [14:19:16] 06Analytics-Kanban: Kill limn1 - https://phabricator.wikimedia.org/T146308#3134005 (10chasemp) [14:20:31] mmmmm does it need an apache vhost? [14:21:22] elukey: no [14:21:49] pretty sure there's no vhost in front of hue on an27 [14:22:01] there used to be (the nginx one) a long time ago, when we did our own ssl termination [14:22:09] nice [14:22:10] but that is handled by the usual nginx routing stuff now [14:23:05] 'analytics1027' => { # Hue (Hadoop GUI) [14:23:05] 'backend' => 'analytics1027.eqiad.wmnet', [14:23:05] 'be_opts' => { 'port' => 8888 }, [14:23:06] }, [14:23:09] ottomata: --^ [14:23:56] no bueno [14:24:25] thorium has multiple websites and all of them are listening on port 80 [14:24:37] hmmm [14:24:59] so we'd need a new director for thorium with a different name? [14:25:05] thorium_hue [14:25:06] ? [14:26:06] or an apache vhost on thorium that proxies to localhost:8888 [14:49:13] 10Analytics-Tech-community-metrics: Maniphest: Parser does not split projects by comma separator? - https://phabricator.wikimedia.org/T161519#3134040 (10Aklapper) [14:54:14] 10Analytics-Tech-community-metrics, 06Developer-Relations, 07Epic: Complete migration to new Bitergia's development dashboard (and then kill korma.wmflabs.org) - https://phabricator.wikimedia.org/T137997#3134074 (10Aklapper) [14:54:17] 10Analytics-Tech-community-metrics, 10Phabricator, 06Developer-Relations (Jan-Mar-2017): Decide on wanted metrics for Maniphest in kibana - https://phabricator.wikimedia.org/T28#3134075 (10Aklapper) [14:54:19] 10Analytics-Tech-community-metrics, 06Developer-Relations (Jan-Mar-2017): Deployment of Maniphest panel - https://phabricator.wikimedia.org/T138002#3134070 (10Aklapper) 05Open>03Resolved Yay! Thanks and congratulations! Let's close this. :) I might report some followup bugs (but so far I've only found T16... [14:58:11] elukey: yeah i guess that would work, seems a little annoying...buuuut maybe we'd retain the advantage of having more control of requests to hue if we use vhost [14:58:18] like, if we wanted to add extra auth or something [14:58:52] ottomata: https://gerrit.wikimedia.org/r/#/c/344952 :) [14:59:23] (runnin pcc) [14:59:58] +1 :) [15:00:37] ping fdans milimetric elukey stadduppp [15:15:15] ottomata: new vhost running on thorium, it looks ok but if you could double check it would be great :) [15:15:21] after this we can merge the VCL changes [15:15:28] and flip hue to thorium [15:15:44] * elukey prepares the missiles to nuke analytics1027 [15:16:42] mobrovac: BOnjour ! [15:16:54] bonjour joal! [15:17:01] ca va bien mobrovac ? [15:17:23] oui oui, c'est pas mal l'Amerique latine :) [15:17:26] :D [15:17:40] :P [15:18:00] mobrovac: I don't have rights to add CR to restbase and I'd like to update metrics.yaml [15:18:08] mobrovac: can you do something for me? (gerrit) [15:18:48] for restbase we use github, not gerrit [15:18:56] we push to gerrit only before a deploy [15:19:23] so you'll need to do a PR against https://github.com/wikimedia/restbase joal [15:19:28] mobrovac: Ah ! Since there was a gerrit repo, I thought it was the opposite [15:19:34] mobrovac: Will make a PR :) [15:19:39] cool thnx [15:28:54] ottomata: qq - do we need to rsync anything from an1027 and/or create users on thorium before the switch? [15:29:21] (I am mentally checking all the things that I could break) [15:29:28] mforns: https://github.com/wikimedia/restbase/pull/782 [15:29:34] mobrovac: for you as well --^ [15:29:43] thanks joal :] [15:29:57] (03CR) 10Nuria: [V: 031 C: 031] Fix domain_abbrev_map job to disambiguate wikimedia projects [analytics/refinery] - 10https://gerrit.wikimedia.org/r/344914 (https://phabricator.wikimedia.org/T156388) (owner: 10Mforns) [15:30:43] kk thnx [15:35:03] elukey: don't think so [15:35:08] most people don't ahve access to an27 [15:35:12] just you and me and joal [15:35:21] and others but i doubt folks log in there cept you and me [15:36:06] ottomata: Will you have some time for me lateron today (sqoop failure) [15:36:09] ? [15:36:38] joa fo sho [15:37:32] thonx ottomata b [15:40:05] ottomata: I meant the manual step to sync users from LDAP etc.., but everything seems ready to go [15:40:07] 10Analytics: Provide historical redirect flag in Data Lake edit data - https://phabricator.wikimedia.org/T161146#3122912 (10Nuria) We will be able to do this once we our changes regarding parsing text (content, not metadata) are final [15:40:14] all right we can merge after the meetings :) [15:40:22] 10Analytics: Provide historical redirect flag in Data Lake edit data - https://phabricator.wikimedia.org/T161146#3122912 (10Nuria) p:05Triage>03Normal [15:40:38] 10Analytics: Update refinery sqoop script to explicitely fail in case a snapshot / destination folder already exists - https://phabricator.wikimedia.org/T161128#3122505 (10Nuria) p:05Triage>03Normal [15:42:09] 10Analytics: Provide edit tags in the Data Lake edit data - https://phabricator.wikimedia.org/T161149#3122965 (10Nuria) is this data available in mediawiki? It used to be that tags were linked to how users registeted rather than where did edit happen, is that now fixed? [15:44:33] 10Analytics: Provide cumulative edit count in Data Lake edit data - https://phabricator.wikimedia.org/T161147#3122928 (10Nuria) p:05Triage>03Normal [15:44:45] 10Analytics: Provide edit tags in the Data Lake edit data - https://phabricator.wikimedia.org/T161149#3122965 (10Nuria) p:05Triage>03Normal [15:45:31] 10Analytics: Use native timestamp types in Data Lake edit data - https://phabricator.wikimedia.org/T161150#3134248 (10Nuria) p:05Triage>03Normal [15:45:42] 10Analytics: Use native timestamp types in Data Lake edit data - https://phabricator.wikimedia.org/T161150#3122979 (10Nuria) Putting this on Q4 [15:47:17] 10Analytics, 06Editing-Analysis: Pivot "MediaWiki history" data lake: Feature request for "Event Users" - https://phabricator.wikimedia.org/T161185#3134263 (10Nuria) Sorry, this is a bit cryptical can you explain a little more what you mean? [15:49:19] 10Analytics, 06Editing-Analysis: Pivot "MediaWiki history" data lake: Feature request for "Event Users" - https://phabricator.wikimedia.org/T161185#3123991 (10JAllemandou) Hi @Jdforrester-WMF , I don't understand your request. If you look at `Event Entity: revision` and `Event Type: Create`, you can get all ev... [15:51:05] 06Analytics-Kanban: Add AQS's new pagecounts endpoint to mediawiki-services-restbase - https://phabricator.wikimedia.org/T161495#3134327 (10Nuria) p:05Triage>03High a:03JAllemandou [15:52:05] 10Analytics: Add mobile-site to AQS legacy pagecounts metric - https://phabricator.wikimedia.org/T161494#3134332 (10Nuria) p:05Triage>03High [15:52:19] 06Analytics-Kanban: Add mobile-site to AQS legacy pagecounts metric - https://phabricator.wikimedia.org/T161494#3133284 (10Nuria) a:03mforns [15:53:02] 06Analytics-Kanban: Document and publicize AQS legacy pageviews - https://phabricator.wikimedia.org/T159959#3134342 (10Nuria) [15:54:18] 10Analytics, 10Analytics-Cluster: Prevent notebooks on spark to launch 2 pyspark instances instead of 1 - https://phabricator.wikimedia.org/T152522#3134345 (10Nuria) p:05Triage>03Low [15:54:59] 10Analytics, 10EventBus, 07Easy, 06Services (watching): EventBus logs don't show up in logstash - https://phabricator.wikimedia.org/T153029#3134346 (10Nuria) p:05Triage>03Low [15:56:09] 10Analytics: Investigate adding user-friendly testing functionality to Reportupdater - https://phabricator.wikimedia.org/T156523#3134351 (10Nuria) p:05Triage>03Low [15:56:39] 10Analytics, 10Analytics-Dashiki, 13Patch-For-Review: Create dashboard for upload wizard - https://phabricator.wikimedia.org/T159233#3134354 (10Nuria) p:05Triage>03Normal [15:57:34] 10Analytics, 10Analytics-Dashiki, 13Patch-For-Review: Create dashboard for upload wizard - https://phabricator.wikimedia.org/T159233#3061266 (10Nuria) Ping matthiasmullie, will you be able to continue this work? seems that it is half way there [15:58:02] joal: merged, will deploy soon-ish [15:58:13] awesome - Thanks a lot mobrovac ! [15:58:16] np [15:58:17] mforns: --^ [15:58:25] :D [16:05:52] !log Relaunch corrected denormalize oozie job [16:05:53] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:10:12] ottomata: A minute for sqoop now ? [16:10:55] joal: in ops meeting... [16:11:06] but, type and i maybe can help async like [16:11:08] what's up? [16:17:45] ottomata: sqoop job failed because it needs all its subinstances to have been successful [16:18:04] And in our case, 3 failed (dawiki, cebwiki, srwiki) [16:18:36] hm, ok [16:18:48] Issue is that I have not found interesting log about the failures :( [16:18:52] why did they fail? [16:18:53] ah ok [16:18:54] ha [16:19:09] joal: app id? [16:19:22] ottomata: let me get back there [16:22:51] 06Analytics-Kanban: Investigate duplicate EventLogging rows - https://phabricator.wikimedia.org/T142667#3134479 (10Nuria) Of 4328945 events for couple weeks of NavigationTiming data about 400 are duplicated on NavigationTIming so about ~0.01% and basically the totallity of those is happening on Windows NT (the... [16:23:11] ottomata: job_1488294419903_75131 [16:24:23] Actually ottomata, seems not be the correct one sorry [16:25:45] ottomata: logs for sqoop are immensely verbose :( [16:26:32] just heard "cassandra 3" in the ops meeting joal [16:26:39] :P [16:26:49] elukey: Wow - Not sure if I should peased or scared :) [16:27:01] I had the same feeling! :D [16:28:43] joal: http://giphy.com/gifs/scared-johnny-depp-ksA9yBw34Hpvi [16:31:01] nuria: [16:31:05] Hello 1 [16:31:12] Looks like you can't hear me [16:53:59] hey everyone, I'm around just making time up from Friday when i had to leave early. So I got a couple of hours of styling, but I'm around if anyone needs any help [16:57:24] 10Analytics, 06Editing-Analysis: Pivot "MediaWiki history" data lake: Feature request for "Event Users" - https://phabricator.wikimedia.org/T161185#3123991 (10Neil_P._Quinn_WMF) I believe James wants to get the number of distinct //users// who created a revision during the time span. Right now, you can see the... [16:58:44] joal: https://wikimedia.org/api/rest_v1/ :) [16:58:57] legacy is live [16:59:01] cc elukey ^ [16:59:35] \o/ [17:00:09] will update the dashboard too [17:00:43] yay mobrovac :) [17:02:46] 10Analytics: Provide edit tags in the Data Lake edit data - https://phabricator.wikimedia.org/T161149#3134624 (10Neil_P._Quinn_WMF) >>! In T161149#3134217, @Nuria wrote: > is this data available in mediawiki? Yes, it's available in the `change_tag` and `tag_summary` tables. >It used to be that tags were linked... [17:03:44] 06Analytics-Kanban: Kill limn1 - https://phabricator.wikimedia.org/T146308#3134627 (10Milimetric) np, go for it @chasemp [17:06:39] 10Analytics, 10Analytics-EventLogging: Find an alternative query interface for eventlogging Mariadb storage - https://phabricator.wikimedia.org/T159170#3058941 (10jcrespo) > lack of DBA resources to dedicate to general upkeeping to the eventlogging boxes Actually, we could continue supporting MariaDB (we supp... [17:11:54] ottomata,joal - on an1044 the DiskErrorException stopped three hours ago, around the time that I restarted the node manager daemon [17:12:05] not sure if this will be a fix but looks promising [17:12:25] let's see [17:12:27] elukey: I'm so hating that when linux needs windows-style fixes ... [17:13:22] this might be some weirdness started during the first puppet runs, so if this works I'll add a step to restart the daemons before declaring a host reimaged to debian [17:13:37] ottomata: I finally managed to find that applicationId (for sqoop failure logs): application_1488294419903_77938 [17:13:43] ottomata: sorry for delay [17:17:36] going offline for the day people [17:17:40] ttl [17:17:41] o/ [17:17:42] bye elukey [17:17:46] Thanks for 1044 ! [17:24:02] 10Analytics, 06Editing-Analysis: Pivot "MediaWiki history" data lake: Feature request for "Event Users" - https://phabricator.wikimedia.org/T161185#3134728 (10JAllemandou) I think there is another confusion then: anonymous doesn't provide a //distinct// metric - It only provides a filter metric. So when you us... [17:25:59] k joal looking at it... [17:27:27] joal: this eh? [17:27:27] com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure [17:31:45] joal: why can't i find that job in yarn ui / hue? [17:39:32] ottomata: I don't know ! [17:39:36] joal: just curious [17:39:41] it is indeed [17:39:49] what's the difference between sqoop '--mappers' and sqoop '--processors' [17:39:51] ? [17:40:22] ah: 1 sqoop job = 1 table of 1 wiki (with n mappers) [17:40:51] we parallelise sqoop jobs using python, the scripts launches k processors in paralle [17:40:52] ah ok [17:40:56] ahhh [17:41:06] ahh sorry ok [17:41:10] processors is not a sqoop flag [17:41:11] got it [17:41:31] ottomata: something else to know: parallelisation happens at wiki level, a single python processor processes all tables of 1 wikis [17:41:38] aye ok [17:41:56] ottomata: this python parallelisation makes it difficult to read logs as well [17:42:41] ottomata: And we might thing of ways to reduce those sqoop logs - With 2 or 3 runs, we already are at 88M [17:43:47] ottomata: I also experienced a weird oozie/spark thing minutes ago [17:43:57] ottomata: a job failed with weird erros [17:46:04] oh? [17:46:15] i'm trying to find mysql server logs for the server where the sqoop queries ran [17:46:21] this seems to be an issue with the server [17:46:26] Ah, right [17:47:02] ottomata: do you think it'd be worth having a to_retry queue, where to put failed jobs to retry them at least once? [17:47:35] ottomata: it feel a bit overkill, but it's a pain if we have to rerun full sqoop for small failures :( [17:49:39] to_retry queue where? [17:49:43] in the python script? [17:49:45] in python [17:49:47] yeah [17:49:47] yeah [17:49:48] can't hurt [17:49:49] hm [17:49:52] true [17:49:54] with a flag for nujmber of retries [17:50:02] something like tha [17:50:24] the idea with the retry queue is to prevent rerunning them in a row [17:50:30] hm, will think again [17:56:18] joal: q: has this job as it is on this labsdb worked 100% before in the past? [17:57:37] 10Analytics, 10EventBus, 10Wikimedia-Stream, 06Services (designing), 15User-mobrovac: Puppetize event schema topic configuration - https://phabricator.wikimedia.org/T161027#3134828 (10mobrovac) [17:58:01] 06Analytics-Kanban: Kill limn1 - https://phabricator.wikimedia.org/T146308#3134829 (10chasemp) p:05Triage>03Normal > | a412cfb3-d6a8-413c-b97b-571004f37803 | limn1 | analytics | ACTIVE | - | Running | public=10.68.16.94... [17:58:09] ottomata: it has always been different: the list of wikis was (a bit) smaller (100 wikis less I think [18:00:45] hm [18:01:01] ottomata: however the 3 wikis that failed earlier on were successfully run manually today [18:01:06] ok, that is good to know [18:01:17] that rules out possible mysql permission problems on just those wikis [18:01:25] its hard to know what happneed, there are not mysql errors on the server at that time [18:01:29] non logged anyway [18:01:29] ottomata: I also checked schemas, they look ok [18:01:48] so yeah, looks like DB issue more than anything [18:01:55] joal: the app id you send me was for a particular wiki? [18:01:59] sqoop db job? [18:02:01] ottomata: maybe we'll wait and see for april job (soon) [18:02:33] ottomata: yes, it was 1 sqoop job, therefore 1 table of 1 wiki [18:03:06] do you know which one? [18:03:09] which wiki that was? [18:03:13] I can find it [18:06:14] joal: strange that multiple attempts failled too [18:08:19] ottomata: same type of errors for dthe 3 wikis (dawiki, cebwiki and srwiki), on table revision [18:08:59] joal: did the other tables succeed? or did it just quit on first failuer? [18:09:21] ottomata: I think they tried revision first (there was no other data) [18:09:22] joal: we should see if those 3 jobs were running and failing at about the same time [18:10:22] ottomata: I got a 1 minute interval between them [18:10:40] grepping the output logs on an03 [18:10:42] 2017-03-24T02:37:27 for the ERROR (after failure) of the last one [18:10:52] i see attempt failures be between 2:31 and 3:29 [18:11:01] ottomata: it's around lines 890000 [18:11:11] grep 'Status : FAILED' sqoop-mediawiki.log | grep '17/03/24' [18:11:43] better: [18:11:44] grep -C 20 -E '17/03/24.*Status : FAILED' sqoop-mediawiki.log [18:12:14] ottomata: Yes, but you don't get stacks :) [18:12:19] stacks? [18:12:21] h [18:12:22] oh [18:12:27] I tail | less and search [18:12:29] ya but the stacks are all the same [18:12:32] correct [18:12:32] i was doing that at first too [18:12:38] the laster grep I have gives some stack [18:12:41] last command ipasted [18:12:46] can expand -C [18:12:50] more context lines [18:12:56] right [18:13:23] anyway, i guess the python script finished running around 20: something [18:13:36] 20:13 is the last log in the file [18:13:51] are there any successeses between 2:31 and 3:29? [18:13:56] guess i gotta just read... [18:15:42] joal: ? [18:15:42] 2017-03-24T02:36:57 INFO FINISHED: cebwiki [18:15:50] oh it says FINISHED even if failed? [18:15:55] I think so [18:16:34] looks like ruwikitionary was started and finisehd between 02:31 and 3:29 [18:16:37] 2017-03-24T02:47:04 INFO FINISHED: ruwiktionary [18:17:00] 2017-03-24T02:35:57 INFO STARTING: ruwiktionary [18:17:00] 2017-03-24T02:47:04 INFO FINISHED: ruwiktionary [18:17:16] so ok, v confusing then [18:17:34] some jobs failed, even though while they failed, others succeeded [18:17:42] joal: maybe we should rerun and try with --processors 1 [18:17:42] ? [18:18:02] ottomata: we could - It'd be somewhat slower (for small wikis) [18:18:05] but whatever [18:19:03] ottomata: Will leave for now [18:19:18] ottomata: I suggest we leave it as is for april, and see if it fails again or not [18:19:25] ottomata: thoughts? [18:19:57] joal these 3 wikis all new for this job? [18:20:00] you said there were more wikis now [18:20:22] ottomata: It's not new per say, it's that the test list we used was not full [18:20:45] right, but i mean, did the previous tests you ran successfully contain these wikis [18:20:50] nope [18:20:59] so this is the first time we've tried to sqoop these wikis [18:21:07] i'm going to try to sqoop cebwiki manually [18:21:10] just in case... [18:21:29] ottomata: I succesfully did that a few hours ago [18:21:32] oh you did?! [18:21:33] hm. [18:21:37] ok [18:21:51] ottomata: I sqooped the 3 failed ones manually [18:21:57] fdans: how is it going with the layout? [18:22:07] fdans: I can continue if you are logging out [18:22:34] joal: we should really change the yarn job names so it shows which wikis/tables it is working on [18:22:40] ok joal have a good evening! [18:22:54] ottomata: agreed, but I think there might be an issue with names, and checking if existing [18:23:03] ok ottomata, more on this tomorrow :) [18:23:06] nuria: configuring static files is no problem, but allowing tabs to use aqs is proving a bit tricky [18:23:09] Thanks :) [18:23:41] fdans: remember we said first changeset to push would be the one to use just the edit csv file [18:24:23] ok, yes I think I can push that now [18:25:14] fdans: ok. let me take a look and I will deploy, so we can have a 301 to the old reportcard [18:30:26] oh, joal i wanted to talk to you about the revscoring dependency things, did you get my eamil? [18:38:24] 10Analytics: Provide edit tags in the Data Lake edit data - https://phabricator.wikimedia.org/T161149#3134962 (10Nuria) Ok, we will keep this one in mind to add once data is being populated in a recurrent schedule w/o issues. [18:38:51] cool, detail page now matches style, prototype updated [18:39:19] nuria: now what we should decide is how much detail from the prototype to implement before starting the consultation [18:39:33] there are a bunch of little pickers/selectors that imply a lot of dynamic behavior [18:39:49] I was thinking we could skip them or make them dummies (selecting does nothing) [18:39:55] milimetric: I think there is plenty for consultation, no need to add anything [18:40:11] milimetric: or well, selecting does nothing sure. [18:40:13] well, no, I think the mechanisms by which a user would change metrics / etc. are importatnt [18:40:25] and the style of the question picker is important [18:40:30] so there's a little more work left [18:40:35] but definitely this week I'll be ready [18:40:49] maybe we can review tomorrow? [18:52:21] milimetric: ok, but i think too much detail detracts from it being a prototype of a UI [18:52:36] milimetric: question, where is the source of https://analytics.wikimedia.org/datasets/periodic/reports/metrics/wikistats/ in 1003? [18:53:07] milimetric: ahm, i found it: nuria@stat1003:/srv/published-datasets/periodic$ cd [18:53:11] cc fdans [18:53:34] 10Analytics, 06Editing-Analysis: Pivot "MediaWiki history" data lake: Feature request for "Event Users" - https://phabricator.wikimedia.org/T161185#3134993 (10Neil_P._Quinn_WMF) Ah, I see. In that case, I think the underlying request here is to provide distinct user counts in addition to event counts. In editi... [18:53:36] nuria: cooool, thanks! [18:53:57] fdans: no, wait that directory doesn't have wikistats [18:54:02] nuria@stat1003:/srv/published-datasets/periodic/reports/metrics$ [18:54:40] got it [18:54:53] ottomata: question if you may [18:55:28] ya? [18:56:08] ottomata: where is the rsync source for https://analytics.wikimedia.org/datasets/periodic/reports/metrics/wikistats/? [18:56:22] ottomata: in 1003 on /srv/published-datasets/periodic/reports/metrics [18:56:32] ottomata: i do not see teh wikistats directory [18:56:33] *the [18:58:09] nuria: same path, stat1002 [18:58:14] well [18:58:18] /a on stat1002 [18:58:24] ahahah [18:58:50] thank you [18:58:54] cc fdans [18:59:36] fdans: let me know when you have the file and config working and we can push changes , i woudl add it under a folder called reportcard [18:59:39] *would [18:59:56] nuria: sure thing, creating now [19:00:05] to keep things clean for now (this is work we will most like scrape later) [19:00:34] fdans: you would need to test locally with the data, not from teh major endpoint [19:00:47] * fdans opens Transmit, the best program in the world [19:05:04] 10Analytics, 06Research-and-Data: geowiki data for Global Innovation Index - https://phabricator.wikimedia.org/T131889#3135015 (10Rafaesrey) Dear Leila, Apologies for pestering so much. Do you have an update for the data? Thank again for all your help and time. Best, Rafael. *From:* leila [m... [19:06:00] 06Analytics-Kanban: Much more pageviews in Tagalog Wikipedia since mid-June 2016 - https://phabricator.wikimedia.org/T144635#3135017 (10Nuria) The spike is confined to June 2016 and it is coming from Philiphines, there is a major traffic drop on April, May right before the increase of June 2016. Increase is pres... [19:06:26] 06Analytics-Kanban: Much more pageviews in Tagalog Wikipedia since mid-June 2016 - https://phabricator.wikimedia.org/T144635#3135018 (10Nuria) {F7040040} [19:17:26] 06Analytics-Kanban: Much more pageviews in Tagalog Wikipedia since mid-June 2016 - https://phabricator.wikimedia.org/T144635#3135085 (10Nuria) {F7040158} Traffic coming to enwiki from Philiphines. Same drop spike is present [19:18:08] 10Analytics-Cluster, 06Analytics-Kanban, 13Patch-For-Review: Review Druid's logging configuration - https://phabricator.wikimedia.org/T155491#2944710 (10Ottomata) I went ahead and just added the cron jobs. If we learn of a way to use log4j for the requests logs, we can revisit this then. [19:19:02] nuria: I've got the file but I don't have permissions to change the datasets directory [19:19:29] fdans: sudo -u hdfs? [19:21:08] fdans: do you have permits to do that? [19:21:27] nuria: all good :) [19:21:41] fdans: were you able to try locally sourcing file from local? [19:23:45] nuria: the file should be https://analytics.wikimedia.org/datasets/periodic/reports/metrics/reportcard/top_10_wikis_by_new_editors.tsv when it syncs [19:23:45] updating config [19:25:10] nuria: sorry, missed that step, testing now [19:28:12] https://usercontent.irccloud-cdn.com/file/Bx0asuoT/Screen%20Shot%202017-03-27%20at%2021.27.41.png [19:28:13] nuria: ^ [19:28:19] now, updating config [19:28:51] just one thing [19:29:08] fdans: sounds good [19:29:10] if I remember correctly the original reportcard also included all projects, right? [19:29:23] fdans: no, just top 10/15 [19:29:29] ah agreggated you mean? [19:29:32] yes [19:29:33] yes [19:31:25] nuria: is there a place where there is that data or should I aggregate everything with a script? [19:32:59] fdans: no, this is a one-off, we will produce all this data from data lake but we do not have it yet [19:34:21] nuria: ok, leave it like this then? [19:35:20] fdans: for now, sure, we are going to have to work more on this later this quarter. [19:38:16] right, config updated, I'll take it from here tomorrow o/ [19:41:13] ottomata: and .. another question if you may [19:44:32] ya! [19:44:34] nuria: ask away [19:45:02] ottomata: who manages the crons that do the rsync? [19:45:07] ottomata: not hdfs user [19:45:11] https://www.irccloud.com/pastebin/PHGwCU4g/ [19:45:39] ottomata: ah wait, stats user then.. [19:46:02] nuria: root [19:46:20] ottomata: ok, ya, could not see it on stats either. [19:46:32] https://github.com/wikimedia/puppet/blob/c0cdc6205456ee702905640ebccb876c2d1c20ef/modules/statistics/manifests/compute.pp#L32-L42 [19:51:26] 06Analytics-Kanban: Much more pageviews in Tagalog Wikipedia since mid-June 2016 - https://phabricator.wikimedia.org/T144635#3135149 (10Nuria) Spikes due to connection issues like the ones we had with Chrome mentioned earlier in this ticket can be identified by a disproportionate number of requests to the Main_p... [19:52:31] 06Analytics-Kanban: Pageview Spike in Tagalog Wikipedia mid-June 2016 - https://phabricator.wikimedia.org/T144635#2605825 (10Nuria) [20:10:37] (03PS1) 10Nuria: Adding Reportcard to readme [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/345003 [20:18:36] (03PS1) 10Nuria: Moving reportcard to analytics.wikimedia.org [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/345004 (https://phabricator.wikimedia.org/T130117) [20:20:13] (03Abandoned) 10Nuria: Updating Readme [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/344671 (owner: 10Nuria) [20:21:00] (03CR) 10Nuria: [V: 032 C: 032] "Self merging to deploy now that old reportcard is down" [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/345003 (owner: 10Nuria) [20:21:37] (03CR) 10Nuria: [V: 032 C: 032] "Self merging to deploy now that old reportcard is down" [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/345004 (https://phabricator.wikimedia.org/T130117) (owner: 10Nuria) [20:23:00] (03Restored) 10Nuria: Updating Readme [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/344671 (owner: 10Nuria) [20:24:14] 10Analytics, 10EventBus, 10Wikimedia-Stream, 06Services (designing), 15User-mobrovac: Puppetize event schema topic configuration - https://phabricator.wikimedia.org/T161027#3135224 (10mobrovac) The original idea of having the `event-schemas` repository was so that others can re-use the config (Vagrant, 3... [20:35:56] !Log deployed reportcard skeleton code to analytics.wikimedia.org https://gerrit.wikimedia.org/r/#/c/345004/ [20:44:21] 06Analytics-Kanban, 13Patch-For-Review: Move reportcard to dashiki and new datasources - https://phabricator.wikimedia.org/T130117#2126320 (10bd808) I setup a redirect using https://wikitech.wikimedia.org/wiki/Nova_Resource:Redirects that emits a 301 redirect from https://reportcard.wmflabs.org/ to https://ana... [20:46:19] ottomata: yt? [20:49:08] ottomata: just comitted some code to analytics.wikimedia.org but it looks like source is not updating. see: https://gerrit.wikimedia.org/r/#/c/345004/, could it be that the git::clone is not working on thorium [22:24:05] 10Analytics-Tech-community-metrics: Maniphest: Parser does not split projects by comma separator? - https://phabricator.wikimedia.org/T161519#3134040 (10Albertinisg) Thanks for the report. You were right, we were using a wrong field. We've modified so now projects are well displayed: https://wikimedia.biterg.io...