[02:26:43] Has anything pertinent changed with PHP or mysqli on Toolforge? [02:27:15] I'm finding the tool will not connect, and I'm getting strange Mysqli errors like "Packets out of order. [02:27:36] But I can connect to my SQL instance just fine from other machines. [02:33:38] !help [02:33:38] If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-kanban [02:53:49] Opened ticket: https://phabricator.wikimedia.org/T280215 [03:25:34] unlikely to have changed on the PHP/mysqli side [03:26:22] Cyberpower678: https://stackoverflow.com/questions/63301495/pdoexception-packets-out-of-order-expected-0-received-1-packet-size-23 suggests you might have maxed out your connection limits? [03:26:54] did you change anything on your side? [04:51:25] stashbot, how's it going? [04:51:26] See https://wikitech.wikimedia.org/wiki/Tool:Stashbot for help. [09:15:05] !log project-proxy refresh hiera XFF entry for ws-export.wmcloud.org T279111 [09:15:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Project-proxy/SAL [09:15:11] T279111: Request to enable XFF headers for wikisource VPS project - https://phabricator.wikimedia.org/T279111 [09:40:39] !log globaleducation bumping quota: cores 20 -> 40, ram 40G -> 80G, instances 8 -> 14 (T279956) [09:40:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Globaleducation/SAL [09:40:47] T279956: Request increased quota for globaleducation Cloud VPS project (multi-node) - https://phabricator.wikimedia.org/T279956 [09:44:52] !log library-upgrader bumping quota: 16->20GB RAM, 8->10 CPUs (T280103) [09:44:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Library-upgrader/SAL [09:44:55] T280103: Request increased quota for library-upgrader Cloud VPS project - https://phabricator.wikimedia.org/T280103 [09:56:41] arturo: I didn't realise it was christmas [11:02:03] Reedy: :-) [11:02:06] cheers [12:22:59] legoktm: I will recheck the connection limits, but they are set pretty high on the DB. Nothing has really changed on my side which is why I'm so confused. [12:25:56] legoktm: my max connections are set to 10,000 and the process list only shows 14 active connections [12:31:36] 10k max connections at what level? the default limits on the database server level are something much less, can't remember the exact value [12:36:13] !log commons-corruption-checker create project T279246 [12:36:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Commons-corruption-checker/SAL [12:36:17] T279246: Request creation of commons-corruption-checker VPS project (2) - https://phabricator.wikimedia.org/T279246 [12:56:31] Cyberpower678: how much network traffic are you using? you may be hitting some network ratelimit [12:57:24] !log wikicommunityhealth allow dumps NFS share (T279558) [12:57:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikicommunityhealth/SAL [12:57:28] T279558: Mount Dumps NFS share on instances in the chm Cloud VPS project - https://phabricator.wikimedia.org/T279558 [13:04:02] arturo: not a lot really. This happens when the mysqli tries to even connect to the SQL server [15:11:50] Is there any insight as to what could cause this? https://phabricator.wikimedia.org/T280215 [15:14:55] harej: there is literally one report (this one) and hundreds of php + mysql users in Toolforge. What does occam's razor imply. :) [15:15:18] Okay, good to know there isn't some bug I didn't know about [15:28:35] bd808: That's useful to know. But I wonder what I did to break things this time. [15:30:00] It's just strange that "mysql" in the shell gets to the DB just fine, but mysqli from the same location (granted in some webservice exec node) doesn't [15:30:21] And nothing code-wise has changed either. [15:33:31] bd808: for the sake of testing, is it possible to change where my webservice is being hosted, just to eliminate the small possibility that one of the nodes may be borked? [15:35:58] the node should not matter at all as the underlying docker container is the same on all the nodes [15:37:01] some things don't change :) [15:40:11] Okay [15:40:24] * Cyberpower678 goes to rip his hair out. [15:40:48] yuvipanda: if that was a poke at me, then yes. lol [15:41:34] :D [17:14:11] NGL, I'm getting more convinced that this isn't IABot, but something on the webservice. Doing more testting. I'm creating a test script. [17:30:19] bd808: OK, I can DEFINITELY confirm that there is an issue with one of the webservices. [17:30:41] I just wrote a test script and threw it into the public_html directory [17:30:42] What if we just nuked the current webservice and created a new one? [17:31:05] I tried that already, but I suspect I keep landing in the same one. I don't know how to check this. [17:31:13] https://usercontent.irccloud-cdn.com/file/wQienLtC/image.png [17:31:16] That's... interesting [17:31:32] Cyberpower678: kubernetes or job gird? [17:31:38] *grid [17:31:38] When I ran the test script from php-cli, it executed to completion, and successfully exited. [17:31:52] On the webservice, it threw a fatal error. [17:32:08] bd808: how do I check, there is no entry in qstat? [17:32:31] qstat is for grid jobs [17:32:41] So kubernetes I assume [17:33:01] so you don't know how you are running your code? [17:33:20] IABot is a webservice isn't it? There would be a webservice log [17:35:16] bd808: I know it runs on the toolforge webservice as it always has. [17:35:26] But the issue spawned on a webservice restart. [17:35:50] Outside of the webservice the code runs fine and mysqli successfully connects to the DB [17:36:11] But within, a strange packets out of order error crops up. [17:36:40] "webservice" is a python script that starts either grid engine jobs or kubernetes deployments. These two runtime environments have differences that are material to investigating [17:37:07] Well, the exact command I used is "webservice start" [17:37:56] It prints a bunch of dots and that's it. [17:38:08] and do you have a ~/service.template file or would that only use the default settings built into `webservice`? [17:38:23] what does `webservice status` tell you? [17:39:02] and what environment was your code running with before you shut it down last week or whenever? [17:39:29] Your webservice of type php7.3 is running on backend kubernetes [17:40:29] I've been running PHP 7.3 last time, and our last conversation I believe the bot was also running on kubernetes [17:41:10] which last conversation? [17:41:22] From a few months ago. [17:41:33] The webservice was untouched since then. [17:41:36] so vague and unactionable [17:41:45] I'm sorry. [17:42:39] From my vantage point it's not easy to inquire what current environment my bot is running on a webservice. [17:42:49] why is that? [17:43:06] `webservice status` has existed for ~4 years [17:43:24] and ~/service.manifest records the active state as well [17:43:24] And it produces that one sentence I pasted above. [17:43:43] I KNOW it was running with PHP 7.3 for a while now. [17:44:04] I'm almost certain it was on Kubernetes [17:44:23] But beyond that, I'm not sure what environment info you need. [17:45:36] !log tools cleared error state from tools-sgeexec-0920.tools.eqiad.wmflabs for a failed job [17:45:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:45:50] bd808: The manifest file says I'm on Debian right now. [17:45:59] Cyberpower678: I'm trying to help you figure out what changed. To do that we need before/after information. We now have some after information, but apparently before is lost to the sands of time [17:47:37] bd808: TBH, I don't think anything has changed environment wise. I can't say what distribution I was running on, but, that aside, IABot is mostly environment agnostic. [17:48:08] Cyberpower678: you stoped the webservice within the last ~2 weeks correct? [17:48:16] bd808 correct [17:48:30] and then when you started it back up it was mysteriously broken? [17:48:57] Yes [17:49:32] and you have no proof of what runtime (grid or kubernetes) it was on when it was not broken? [17:50:04] If kubernetes never shows up in qstat, then it was kubernetes no doubt [17:50:24] I can see that there is a ~/service.log stating "2021-03-25T18:48:17.120388 No running webservice job found, attempting to start it" [17:50:26] I haven't seen a webservice job in there for a long while now. [17:50:39] That log file is related to the grid and not kubernetes [17:51:02] Is that a left over bigbrother thing? [17:51:23] so the grid watcher at least thought you were running the webservice on the grid a few weeks ago [17:52:32] Huh. I definitely don't recall seeing any webservice jobs listed in the qstat output. [17:53:05] I would suggest trying the webservice on the grid backend to see if it works differently. `webservice stop; webservice --backend=gridengine start` [17:53:24] bd808: I'm sorry if I'm giving you a headache. [17:53:33] that may or may not make it better but it will give you some more data [17:53:48] Let me switch [17:54:53] bd808 it workds [17:55:00] magic! [17:55:04] It's executing successfully. [17:55:28] so your code works on php7.2 (grid and bastion) and not php7.3 (kubernetes) [17:55:32] bd808: you're my best friend here. :-) [17:55:48] But IABot does work with PHP 7.3 [17:56:19] but not with the mysqli that is in our php7.3 apparently [17:56:22] It's the version I use to actively develop IABot on my machine. [17:57:09] I think it's still worth having a look into at some point. [17:58:03] bd808: I wonder if it's some network issue perhaps from kubernetes to cyberbot-db-01 [17:59:12] If everyone else is working just fine, and I'm getting "Packets out of order" when trying to connect to it, maybe something funky is happening when routing and/or IOing to and from it. [18:00:33] After all, my DB lives elsewhere than most Toolforge users. [18:01:21] But in any event, thank you for that quick fix. [19:26:17] !log tools.lexeme-forms deployed 051e3789a2 (l10n updates) [19:26:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lexeme-forms/SAL