Explore topics discovered in the Pre-Clerkship Curriculum of Yale School of Medicine.
Topic modeling uses statistical algorithms to identify topics that describe the content in a large collection of texts called a corpus. Topic modeling discovers semantically-related words in the corpus and groups those words into a topic. The meaning of a topic is derived from formal definition of the words in the topic and their meaning in the context of the other words in the topic. Because words have different meanings depending on their context, a word can appear in more than one topic.
Topic modeling can identify the significance of different concepts to a corpus. In topic modeling, each topic is given a score based on the number of words in the topic. Topics with higher scores are
Topic modeling can uncover hidden relationships between documents in a corpus. Each document is represented by one or more topics and each topic is given a score for how well it represents the words in a document. Comparing the topic scores of documents can find documents that cover related concepts or have similar content.
A topic model was generated for the Pre-Clerkship Curriculum for the YSM Class of 2024 (August, 2000 - December, 2001). Text from all curriculum materials (slides, notes, readings, etc.) that were given to students was used to generate the model.