How We Matched You to Your CodeLabs Team

Julie Cover, CodeDay

Watch the slides live on your device!

1. The Problem, and Others Like It 

2. My Solution + Algorithm Design 

3. Lessons from Another Itteration

A Crash Course in Algorihms

Problems

Details Matter!!

Optimality

Runtime & O Notation

O(1)O(1)
O(1)
O(logx)O(\log{x})
O(\log{x})
O(x)O(x)
O(x)
O(x2)O(x^2)
O(x^2)

Runtime & O notation

Hard Problems

O(2x)O(2^x)
O(2^x)
O(x!)O(x\,!)
O(x\,!)
O(xx)O(x^x)
O(x^x)

The Problem, and

Others Like It

Phase 1: Priorities and Recommendations

Phase 2: Placement

for id, project in all_project_data.items():
    if project["proj_size_remaining"] == project["num_first_choice"]:
        all_project_data, 
          student_placements = place_students_of_choice(all_project_data, student_placements, 
                                                        id, 1,
                                                        project["proj_size_remaining"])

for id, project in all_project_data.items():
    if project["proj_size_remaining"] >= project["num_first_choice"]:
        all_project_data, 
          student_placements = place_students_of_choice(all_project_data, student_placements,
                                                        id, 1,
                                                        project["proj_size_remaining"])

_all_project_data = deepcopy(all_project_data)
for id, project in _all_project_data.items():
    if project["proj_size_remaining"] >= project["num_first_choice"]:
        all_project_data, 
          student_placements = place_students_of_choice_balanced(all_project_data, 
                                                                 student_placements,
                                                                 id, [2, 15],
                                                                 project["proj_size_remaining"])

_all_project_data = deepcopy(all_project_data)
for id, project in _all_project_data.items():
    all_project_data, 
      student_placements = place_students_of_choice_balanced(all_project_data, student_placements, 
                                                             id, [1, 2], 
                                                             project["proj_size_remaining"])
for i in range(3, 16, 4):
    _all_project_data = deepcopy(all_project_data)
    for id, project in _all_project_data.items():
        if project["proj_size_remaining"] >= project["num_first_choice"]:
            all_project_data, 
              student_placements = place_students_of_choice_balanced(all_project_data,
                                                                     student_placements, id,
                                                                     [i, i + 1, i + 2, i + 3],
                                                                     project["proj_size_remaining"])

Stable Marriage Problem

AKA Gale–Shapley algorithm

O(nlogn)O(n\log{n})
O(n\log{n})

Stable Roommates Problem

  • Does not require preferences on all members
  • Only one group

Stable Marriage Problem

AKA Gale–Shapley algorithm

Half-strong Stability

Ties & Incomplete Lists - Perfect!

It's NP-Hard :(

We can still do this..

My Solution

(with a side of algorithmic design tips)

0. Data Collection

1. Elastic and Suggestions

2. APIs n' Stuff

3. Placement Algorithm

4. Manual Verification

Phase 0: Data

const mentorSchema = {
    mentor_id: "",
    name: "",
    company: "",
    bio: "",
    backgroundRural: true,
    preferStudentUnderRep: 2, (0-2)
    okExtended: true,
    timezone: -7,
    preferToolExistingKnowledge: true,
    proj_id: "",
    proj_description: "",
    proj_tags: [""],
    studentsSelected: 2,
};
const studentSchema = {
    id: "",
    name: "",
    rural: false,
    underrepresented: false,
    requireExtended: true,
    timezone: -3,
    interestCompanies: [""],
    interestTags: [""],
    beginner: true,
};

Phase 1: Matching w/ Elastic

Elastic is great, but also frustrating

  1. Docs are from another era

Phase 1: Matching w/ Elastic

Elastic is great, but also frustrating

  1. Docs are from another era
  2. Python tooling is hit or miss

=

Phase 1: Matching w/ Elastic

Elastic is great, but also frustrating

=

  1. Docs are from another era
  2. Python tooling is hit or miss
  3. Portability is Fantastic

Elastic - What's Your Point?

  • Elastic is incredibly powerful
    • Solves many unique problems
    • Queries themselves are almost language-agnostic
  • Complexity is inevitable
  • Bad docs and language APIs aren't
  • ... probably just buy the book

Phase 2: APIs n' Stuff

It's super easy!

An amazing API Paradigm!

Phase 2: APIs n' Stuff

It's super easy!

An amazing API Paradigm!

  1. Planning
    1. Is your problem hard?
  2. Revising / Breaking
  3. goto 1
  4. Implementation
  5. Patching
  6. Tuning
  7. Revising Again

Phase 3: Placement Algorithm

Or: Julie's Guide to Efficient & Practical Algorithm Design

"For every n minutes of planning, you save 10n minutes implementing"

- Sun Tzu,

Probably

Phase 3: Placement Algorithm

1. Start with projects that have the right number of first place votes already. Assign students to those by adding their information to the saved project dictionary. Remove those student's votes from all projects to avoid duplicates, and remove the projects

2. Then, assign first choice votes to students on projects with less first choice votes than the projects need. Also those students votes from other projects. Decrement `proj_size_remaining`.

3. Then, assign second, third, and more choice votes as needed until `proj_size_remaining` = 0, them remove the project. Do this in order, all second place votes, third place votes, and so on so that students get their lowest possible choice. If multiple students are tied, be sure to assign based on which student has the fewest votes left in other projects. Also remember to remove the student from all other projects when their vote is saved.

4. Once all projects with less first choice votes than needed are dealt with, we are left with only projects that started with more than enough first choice votes. These should have exactly the correct number of first choice votes left due to how students have been removed. Assign these students, and complain loudly if something is wrong.

  1. Multiple Steps - as many as needed
  2. Verbose, clear, and actionable descriptions
  3. Build in checks and metrics

Phase 4: Manual Cleanup - Sometimes

Not always possible!

Try to work with others

Lessons Learned in Round 2

  • Working around NP-Hard
    • Sometimes, unusual approaches are the most practical
  • Scopes Change
  • Algorithm > Language for Speed

Speed Round

Thanks for Watching!

CodeDay!

(it's pretty great)

www.codeday.org

CodeLabs Matching Tech Talk - Round 2

By Julie

CodeLabs Matching Tech Talk - Round 2

  • 91