Jump to content
Zoredache

Development project: Optimize Autorank leaderboard generation

Recommended Posts

Problem:  Autorank includes a feature to display the leaderboard of top (n) players sorted by maximum time played.  Unfortunately, when a player runs the /ar leaderboard command, and the cached data has expired, the TPS of the server drops really low (worst I have seen was ~15 -> ~3).

My hypothesis, is that this happens because autorank uses a horrible method/algorithm for updating the leaderboard.  Unfortunately I am not great with Java, so I don't know how to do profiling of the code to prove my hypothesis.  If you can prove I am wrong about the bottleneck and propose an alternate fix for this code that would also be great.

Code: https://github.com/Armarr/Autorank-2/blob/master/src/me/armar/plugins/autorank/leaderboard/Leaderboard.java

So if I am reading the code right, the updateLeaderboard() method gets a copy of the list of of all the players (who have ever played), sorts the **entire** list, then displays the top (n) players.  This seems like a huge mistake since we have about 15,000 players in our autorank data file, I suspect this copy, then sort is what is killing the performance.

Instead, I think the code should be using a Priority queue ( https://en.wikipedia.org/wiki/Priority_queue ).  Which by my understanding basically works like this.  Build a queue of size (n), iterate over your items to be ranked.  If, and only if the current item, is larger then the smallest item in the queue, then add it to the queue.  Skip all items that are not bigger the the smallest item in the queue.  With a priority queue you are never sorting a huge set of items, you are at most sorting a (n) sized set of items a few times.

 

  • The code is at ( https://github.com/Armarr/Autorank-2 ), I expect you finished product be completed in a way that can be submitted as a pull request to the upstream developer.  I do not want to maintain an Autorank fork!
    • You should follow the coding style of the existing project.  Same indenting, line ending style, braces and so on.
  • I can provide access to a vm for you to test on that will be similar to the production server.  You will not have access to the production server for this project./
    • You will need to know some Linux, and be able to provide me a public portion of an SSH key for authentication.
  • I expect to get a link to a git repo for the code to be compiled.  I will not be using a pre-compiled jar file on the production server, so it will be critical for you to be sure your code cleanly compiles on systems other then your own.

Links:

If you seriously plan on working on this project please post on this thread, and post frequent updates.  If you decide you are no longer interested in helping after you have posted, please that as an update also.

 

  • Like 4

Share this post


Link to post
Share on other sites

I'll save the theory explanation for now, but I don't think that it's caused by using the wrong data structure. He's running this method in another thread, so it shouldn't be blocking your main thread. The only time it looks like it might lag is if the "force" argument is added, but we can look at that after this issue is solved.

	public void sendLeaderboard(final CommandSender sender, final dataType type) {
		if (shouldUpdateLeaderboard(type)) {
			// Update leaderboard because it is not valid anymore.
			// Run async because it uses UUID lookup
			plugin.getServer().getScheduler().runTaskAsynchronously(plugin, new Runnable() {
				@Override
				public void run() {
					updateLeaderboard(type);

					// Send them afterwards, not at the same time.
					sendMessages(sender, type);
				}
			});
		} else {
			// send them instantly
			sendMessages(sender, type);
		}
	}

It's still fishy that it lags when your run the command, but let's make sure we're looking in the right place. Set plugin-profiling in your bukkit.yml  to true, then run /timings on, then /ar leaderboard, and finally run /timings paste. It will profile the plugins and might help us determine which event is causing the lag. It will list all of the plugins the server runs and profile each one of them (based on events if I remember properly). If you want to, PM me the links and I'll see if I find anything weird in them. We can go from there once that's done. :) 

  • Like 2

Share this post


Link to post
Share on other sites
17 hours ago, Hydrosis said:

It's still fishy that it lags when your run the command, but let's make sure we're looking in the right place. Set plugin-profiling in your bukkit.yml  to true, then run /timings on, then /ar leaderboard, and finally run /timings paste. It will profile the plugins and might help us determine which event is causing the lag. It will list all of the plugins the server runs and profile each one of them (based on events if I remember properly). If you want to, PM me the links and I'll see if I find anything weird in them. We can go from there once that's done. :) 

Perhaps I don't know how to read the timings, but that isn't telling me much.  I do know that that I had htop running on the server, and the load average jumped from ~0.7 up to ~1.8 when I ran the command.  It took almost 4 minutes to generate the leaderboard on my dev instance that has a copy of the autorank data from our SMP.

 

    $ # remove cached leaderboard
    $ rm /srv/mc/ar/data/plugins/Autorank/internalprops.yml
    $ /srv/mc/ar/start.sh
    ... Lots of bootup spam

    [21:13:51 INFO]: [Autorank] Config files have been correctly setup!
    >timings on
    [21:14:00 INFO]: Enabled Timings & Reset
    >ar leaderboard
    [21:15:48 INFO]: [Statz debug] Save Statz database.
    [21:17:49 INFO]: [Statz debug] Save Statz database.
    [21:17:54 INFO]: -------- Leaderboard (All time) --------
    [21:17:54 INFO]: 1 | raethe - 107 days, 22 hours and 5 minutes.
    [21:17:54 INFO]: 2 | onmy412ish - 105 days, 1 hour and 30 minutes.
    [21:17:54 INFO]: 3 | zoredache - 102 days, 15 hours and 10 minutes.
    [21:17:54 INFO]: 4 | lordlucifer - 98 days, 5 hours and 0 minutes.
    [21:17:54 INFO]: 5 | impling - 95 days, 1 hour and 50 minutes.
    [21:17:54 INFO]: ------------------------------------
    >timings paste
    [21:18:03 INFO]: Timings results can be viewed at https://www.spigotmc.org/go/timings?url=vobeyiquwa

 

If you are interested in working on this, send me an SSH public key, and I'll give you access to this instance on my test machine.

 

Share this post


Link to post
Share on other sites

On the Linux side is the sysstat package installed on the server running ar? If so you can run iostat (below) about half a minute before kicking off the leaderboard generation.  It will capture high level IO information and if it's thrashing the disks and waiting on IO you'll see a spike in %iowait and an increased avgqu-sz during the process. This will collect 150 samples with 2s between each (i.e. 5 mins). iostat is production safe as it's "passive" and only reading the data from /proc.

    iostat -x 2 150 > /tmp/[some-temp-file]

Just thought some more - is the data local or remote, and in a MySQL DB or filesystem? If stored in a remote MySQL DB then the above wouldn't tell you anything useful unless you ran it on the MySQL box.

Edited by SteelyEyed
Remote MySQL?
  • Like 2

Share this post


Link to post
Share on other sites
On 10/10/2016 at 8:54 PM, SteelyEyed said:

Just thought some more - is the data local or remote, and in a MySQL DB or filesystem? If stored in a remote MySQL DB then the above wouldn't tell you anything useful unless you ran it on the MySQL box.

AutoRank data (per server) is stored in the local file system. Only global time data is stored in a MySQL database (also local). 

Share this post


Link to post
Share on other sites

When I did some debugging with Zore looked like the lag was coming from reading from the disk. It wasn't blocking the main thread, but it was just CPU intensive. I recommended Zore to ask the developer to add SQL support for everything, except config files.

Edited by Hydrosis

Share this post


Link to post
Share on other sites

Good to know, thanks for updating.

I took a quick look at the data files. If it was interesting it should be possible to use "last seen" dates to filter and remove rows for accounts inactive for a long time and with under some number of minutes of play time, rows could be added back if an account came back. Kinda like manually moving data to and from cold storage, it would probably be enough to do that move to/from cold storage daily. Honestly not sure how that would affect the global times but a smaller number of rows would reduce the IO demand.

Share this post


Link to post
Share on other sites
2 hours ago, SteelyEyed said:

I took a quick look at the data files. If it was interesting it should be possible to use "last seen" dates to filter and remove rows for accounts inactive for a long time and with under some number of minutes of play time, rows could be added back if an account came back. Kinda like manually moving data to and from cold storage, it would probably be enough to do that move to/from cold storage daily. Honestly not sure how that would affect the global times but a smaller number of rows would reduce the IO demand.

Have you looked to see if that is something the plugin would actually support?  Sure I could delete things from the data file, but can I do it while the server is running without corrupting things, or the plugin just re-writing everything I removed because it had cached the 'current' state in memory.

For lots of plugins changing things while the server is running will either get your changes lost, or break things.  Only a few plugins support it, and usually you have to send a console command to tell it to reload the config/data files.

 

Share this post


Link to post
Share on other sites

Haha, I just stumbled upon this. I was asking myself the same question: 'How can I optimise the leaderboard?'. Unfortunately, it's not as easy as it seems :(

I would gladly take any advice you guys can come up with.

  • Like 4

Share this post


Link to post
Share on other sites
4 hours ago, Staartvin said:

Haha, I just stumbled upon this. I was asking myself the same question: 'How can I optimise the leaderboard?'. Unfortunately, it's not as easy as it seems :(

I would gladly take any advice you guys can come up with.

I don't really know java enough to solve the problem, but a player here managed to bang out a python script that can sort the data file, and look up the UUIDs from essentials and store it in a location we can use to feed this web-based leaderboard.  What is really funny is that python script runs in about 400ms.  But when Autorank generates the leaderboard it takes something like 2-3 minutes.  We have something like 15,000 players in our data file.

https://players.diversitysmp.com/arlb/leaderboard.php

Of course the real answer is that you should probably strongly consider using a sql database for both the per-server and global data.  That way a 'leaderboard update' would be a simple query like 'select totaltime, playername, uuid from table order by time desc`  That is, let the database server handle it. That is what databases are good for.  Sorting data is what database servers are good at.

 

  • Like 2

Share this post


Link to post
Share on other sites
On 1/26/2017 at 2:31 PM, Zoredache said:

Of course the real answer is that you should probably strongly consider using a sql database for both the per-server and global data.  That way a 'leaderboard update' would be a simple query like 'select totaltime, playername, uuid from table order by time desc`  That is, let the database server handle it. That is what databases are good for.  Sorting data is what database servers are good at.

I like this idea. Not only would it be good for the leaderboard, but it'd be good for storing and calculating global AR time; Have a separate table for each server. Won't need a table for global time, just make the plugin calculate global time by adding all the server tables together. The servers would need to be added in a config file per server though, so that'd make it a bit more complicated for the end user. Perhaps look into creating this plugin to work on the Bungeecord proxy instead? But then that will mess with the AFK integration...

I'm just throwing around random suggestions :P 

  • Like 2

Share this post


Link to post
Share on other sites

Looks like the new update to AutoRank includes a leaderboard improvement:

Quote

Massively improved performance of leaderboard (by 99.95%). Before this update, a server with 60.000 players stored in Autorank's database could take 30 minutes to update the leaderboard. Now, it takes about 1 second. This possibly solves leaderboards not showing correctly.

Will need to upgrade and test this out.

  • Like 3

Share this post


Link to post
Share on other sites
Guest
This topic is now closed to further replies.

×

Important Information

By using this site, you agree to our Terms of Use.