Julie Cover, SRND
Watch the slides live on your device!
1. The Problem, and Others Like It
2. My Solution + Algorithm Design
3. Lessons from Another Itteration
for id, project in all_project_data.items():
if project["proj_size_remaining"] == project["num_first_choice"]:
all_project_data,
student_placements = place_students_of_choice(all_project_data, student_placements,
id, 1,
project["proj_size_remaining"])
for id, project in all_project_data.items():
if project["proj_size_remaining"] >= project["num_first_choice"]:
all_project_data,
student_placements = place_students_of_choice(all_project_data, student_placements,
id, 1,
project["proj_size_remaining"])
_all_project_data = deepcopy(all_project_data)
for id, project in _all_project_data.items():
if project["proj_size_remaining"] >= project["num_first_choice"]:
all_project_data,
student_placements = place_students_of_choice_balanced(all_project_data,
student_placements,
id, [2, 15],
project["proj_size_remaining"])
_all_project_data = deepcopy(all_project_data)
for id, project in _all_project_data.items():
all_project_data,
student_placements = place_students_of_choice_balanced(all_project_data, student_placements,
id, [1, 2],
project["proj_size_remaining"])
for i in range(3, 16, 4):
_all_project_data = deepcopy(all_project_data)
for id, project in _all_project_data.items():
if project["proj_size_remaining"] >= project["num_first_choice"]:
all_project_data,
student_placements = place_students_of_choice_balanced(all_project_data,
student_placements, id,
[i, i + 1, i + 2, i + 3],
project["proj_size_remaining"])
AKA Gale–Shapley algorithm
AKA Gale–Shapley algorithm
(with a side of algorithmic design tips)
0. Data Collection
1. Elastic and Suggestions
2. APIs n' Stuff
3. Placement Algorithm
4. Manual Verification
const mentorSchema = { mentor_id: "", name: "", company: "", bio: "", backgroundRural: true, preferStudentUnderRep: 2, (0-2) okExtended: true, timezone: -7, preferToolExistingKnowledge: true, proj_id: "", proj_description: "", proj_tags: [""], studentsSelected: 2, };
const studentSchema = { id: "", name: "", rural: false, underrepresented: false, requireExtended: true, timezone: -3, interestCompanies: [""], interestTags: [""], beginner: true, };
Elastic is great, but also the worst
Elastic is great, but also the worst
=
Elastic is great, but also the worst
=
Elastic is great, but also the worst
=
# Lines 56-74
company_q = None
for company in student["interestCompanies"]:
if company_q is None:
company_q = Q(
"function_score",
query=Q("fuzzy", company=company),
weight=company_score,
boost_mode="replace",
)
else:
company_q = company_q | Q(
"function_score",
query=Q("fuzzy", company=company),
weight=company_score,
boost_mode="replace",
)
# 111-113
combined_query = Q(
"function_score",
query=combined_query,
functions=SF(
"gauss",
numStudentsSelected={"origin": 0,
"scale": 3,
"offset": 3,
"decay": 0.50
}
)
)
Elastic is great, but also the worst
=
combined_query = Q( "function_score", query=combined_query, functions=[ SF( "script_score", script={ "source": """ int student_tz = params.student_tz; int mentor_tz = 0; // Null check. Even though timezone is required, somehow some null rows snuck in and bamboozled me if (doc['timezone'].size() == 0) { mentor_tz = 0; } else { mentor_tz = (int)doc['timezone'].value; } int diff = (int)Math.abs(student_tz - mentor_tz); boolean mentor_ok_tz_diff = false; if (doc['okTimezoneDifference'].size() == 0) { mentor_ok_tz_diff = false; } else { mentor_ok_tz_diff = doc['okTimezoneDifference'].value; } if (mentor_ok_tz_diff == true) { if (student_tz > 0) { // Mentor is OK with the time difference and student has a large time difference return 1; } else { // Mentor is ok with time difference and student has a normal time return 0.75; } } else { if (diff <= 2) { // Mentor is not ok with time difference and student has normal time return 1; } else if (diff == 3) { return 0.75; } else { // Mentor is not ok with time difference and student has weird time return 0; } } """, "params": {"student_tz": student["timezone"]}, }, ) ], boost_mode="multiply", score_mode="sum", )
2. Code in strings
3. Lack of Readability
1. Painless Scripts
1. Painless Scripts
4. Debugging is a disaster
5. Breaks the explain function
It's super easy!
Great for APIs!
It's super easy!
@app.route("/matches/<student_data>", methods=["GET"]) def matches(student_data): try: data = decode(student_data, current_app.jwt_key, algorithms=["HS256"]) except exceptions.DecodeError: raise Unauthorized("Something is wrong with your JWT Encoding.") ela_resp = evaluate_score(data, current_app.elasticsearch, 25) resp = [ {"score": hit._score, "project": hit._source.to_dict()} for hit in ela_resp.hits.hits ] return json.dumps(resp)
# This is needed to allow the other libraries to import database, # as python doesn't check in the parent directory otherwise. currentdir = os.path.dirname(os.path.abspath( inspect.getfile(inspect.currentframe()))) parentdir = os.path.dirname(currentdir) sys.path.insert(0, parentdir)
# Store any Object on the App object!! app.elasticsearch = Elasticsearch(elastic_host) app.jwt_key = os.getenv("JWT_KEY") ----------------- from flask import current_app print(current_app.jwt_key)
It's super easy!
// Sample GET request input const requestData = { "id": str(uuid.uuid4()), "name": "John Peter", "rural": True, "underrepresented": False, "timezone": -4, "interestCompanies": ['Microsoft', "Google"], "interestTags": ["Backend", "Data", "python", "php"], "requireExtended": False, "track": "Advanced" }
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpZCI6IjE5NWY2ZDI5LTYyMjYtNDUyNC05ODJmLTc5M2ZhMThlN2ViOSIsIm5hbWUiOiJKb2huIFBldGVyIiwicnVyYWwiOnRydWUsInVuZGVycmVwcmVzZW50ZWQiOmZhbHNlLCJ0aW1lem9uZSI6LTQsImludGVyZXN0Q29tcGFuaWVzIjpbIk1pY3Jvc29mdCIsIkdvb2dsZSJdLCJpbnRlcmVzdFRhZ3MiOlsiQmFja2VuZCIsIkRhdGEiLCJweXRob24iLCJwaHAiXSwicmVxdWlyZUV4dGVuZGVkIjpmYWxzZSwidHJhY2siOiJBZHZhbmNlZCJ9.kZ_LsGQrno3kL5Gm_M9WD1ttFsHz4BO32aAZvAYt5n0
1. Start with projects that have the right number of first place votes already. Assign students to those by adding their information to the saved project dictionary. Remove those student's votes from all projects to avoid duplicates, and remove the projects
2. Then, assign first choice votes to students on projects with less first choice votes than the projects need. Also those students votes from other projects. Decrement `proj_size_remaining`.
3. Then, assign second, third, and more choice votes as needed until `proj_size_remaining` = 0, them remove the project. Do this in order, all second place votes, third place votes, and so on so that students get their lowest possible choice. If multiple students are tied, be sure to assign based on which student has the fewest votes left in other projects. Also remember to remove the student from all other projects when their vote is saved.
4. Once all projects with less first choice votes than needed are dealt with, we are left with only projects that started with more than enough first choice votes. These should have exactly the correct number of first choice votes left due to how students have been removed. Assign these students, and complain loudly if something is wrong.
Or: Jake's Guide to Algorithm Design
"For every n minutes of planning, you save 10n minutes implementing"
- Sun Tzu
Not always possible!
Try to work with others
Find slides at: