I should add that this isn’t the first time this has happened, but it is the first time since I reduced the allocation of RAM for PostgreSQL in the configuration file. I swore that that was the problem, but I guess not. It’s been almost a full week without any usage spikes or service interruptions of this kind, but all of a sudden, my RAM and CPU are maxing out again at regular intervals. When this occurs, the instance is unreachable until the issue resolves itself, which seemingly takes 5-10 minutes.

The usage spikes only started today out of a seven-day graph; they are far above my idle usage.

I thought the issue was something to do with Lemmy periodically fetching some sort of remote data and slamming the database, which is why I reduced the RAM allocation for PostgreSQL to 1.5 GB instead of the full 2 GB. As you can see in the above graph, my idle resource utilization is really low. Since it’s probably cut off from the image, I’ll add that my disk utilization is currently 25-30%. Everything seemed to be in order for basically an entire week, but this problem showed up again.

Does anyone know what is causing this? Clearly, something is happening that is loading the server more than usual.

  • babbiorsetto@lemmy.orefice.win
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I had the same thing happen. Max CPU usage, couldn’t even ssh in to fix it and had to reboot from aws console. Logs don’t show anything unusual apart from postgres restarting 30 minutes into the spike, possibly from being killed by the system.

    You say yours solved itself in 10 minutes, mine didn’t seem to stop after 2 hours, so I reeboted. It could be that my vps is just 1 CPU, 1 GB RAM, so it took longer doing whatever it was doing.

    Now I set up RAM and CPU limits following this question, and an alert so I can hopefully ssh in and figure out what’s happening when it’s happening.

    Any suggestions on what I should be looking at if I manage to get into the system?

    • babbiorsetto@lemmy.orefice.win
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      It just happened again. I couldn’t ssh in despite the limit on docker resources, which leads me to believe it may not be related to docker or Lemmy.

      This time it lasted only 20 minutes or so. Once it was over I could log back in and investigate a little. There isn’t much to see. lemmy-ui was killed sometime during the event

      IMAGE                        COMMAND                  CREATED      STATUS         PORTS                                              
      nginx:1-alpine               "/docker-entrypoint.…"   9 days ago   Up 25 hours    80/tcp, 0.0.0.0:14252->8536/tcp, :::14252->8536/tcp
      dessalines/lemmy-ui:0.18.0   "docker-entrypoint.s…"   9 days ago   Up 3 minutes   1234/tcp                                              
      dessalines/lemmy:0.18.0      "/app/lemmy"             9 days ago   Up 25 hours                                                         
      asonix/pictrs:0.4.0-rc.7     "/sbin/tini -- /usr/…"   9 days ago   Up 25 hours    6669/tcp, 8080/tcp                                    
      mwader/postfix-relay         "/root/run"              9 days ago   Up 25 hours    25/tcp                                                
      postgres:15-alpine           "docker-entrypoint.s…"   9 days ago   Up 25 hours
      

      I still have no idea what’s going on.

    • jbernardini@boulder.ly
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      I rebooted about 5 minutes into it. running a t2.micro instance but it went back into high cpu after reboot and I was still unable to ssh in for another 5 minutes. I just rebooted it again to be sure and it was available almost immediately.

    • EuphoricPenguin@normalcity.lifeOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      I’ll save this to look at later, but I did use PGTune to set my total RAM allocation for PostgreSQL to be 1.5GB instead of 2. I thought this solved the problem initially, but the problem is back and my config is still at 1.5GB (set in MB to something like 1536 MB, to avoid confusion).