The first step in using OASIS is to generate user data. This section provides a detailed overview of how user data should be prepared for simulation.

Data Formats

OASIS supports multiple social media platforms, each with its own data format. Note that the type of data format must align with the DefaultPlatformType or RecsysType specified for the platform.

Twitter Format (CSV)

For Twitter simulations, OASIS stores user data in CSV files. Each user (agent) requires the following information:

FieldDescription
user_idA unique identifier assigned to each agent
nameThe real name of the agent
usernameThe username of the agent within the system
created_atThe registration timestamp of the agent on the platform (agents will perform a sign-up action at the start of the simulation)
following_agentid_listA list containing the IDs of other agents this user follows
previous_tweetsInitial tweets from the user, which are injected into the environment during simulation
user_charA brief self-description of the agent (included in the agent’s system prompt to establish an initial identity)
descriptionSimilar to user_char, serving as the agent’s self-description

Example Twitter CSV Format

user_idnameusernamefollowing_agentid_listprevious_tweetsuser_chardescription
14529063user_9user9[32][“hello world”]Beach bum, web developer, nerd 🤓, crocheter, avid reader 📚, a singer in the shower, a notorious heart breaker. I blog about books @ https://t.co/JjnKtEnq4RBeach bum, web developer, nerd 🤓, crocheter, avid reader 📚, a singer in the shower, a notorious heart breaker. I blog about books @ https://t.co/JjnKtEnq4R

Reddit Format (JSON)

For Reddit simulations, OASIS uses JSON files to store user data. Each user object contains the following fields:

FieldDescription
realnameThe real name of the agent
usernameThe username of the agent within the Reddit platform
bioA brief bio displayed on the user’s profile
personaA detailed description of the agent’s personality, interests, and background (used for the agent’s system prompt)
ageThe age of the agent
genderThe gender of the agent
mbtiMyers-Briggs Type Indicator of the agent
countryThe country where the agent is based
professionThe agent’s profession or field of work/study
interested_topicsAn array of topics the agent is interested in

Example Reddit JSON Format

[
  {
    "realname": "James Miller",
    "username": "millerhospitality",
    "bio": "Passionate about hospitality & tourism. Exploring the world one destination at a time.",
    "persona": "James is a seasoned professional in the Hospitality & Tourism industry. With a knack for business and a keen interest in economics, he enjoys analyzing market trends and staying updated on the latest developments in the field. When not working, he loves traveling to exotic locations, sampling local cuisines, and experiencing different cultures. Follow for industry insights and travel inspiration!",
    "age": 40,
    "gender": "male",
    "mbti": "ESTJ",
    "country": "UK",
    "profession": "Hospitality & Tourism",
    "interested_topics": [
      "Economics",
      "Business"
    ]
  },
  {
    "realname": "Emma Hayes",
    "username": "emma_logistics_guru",
    "bio": "Passionate about transportation and logistics | ENFJ | Always seeking new connections and opportunities",
    "persona": "Emma Hayes is a 19-year-old logistics enthusiast currently studying Transportation, Distribution & Logistics. With a bubbly and outgoing personality (ENFJ), she loves discussing culture, society, and business trends. Emma is always expanding her knowledge in the transportation industry and enjoys connecting with like-minded individuals to exchange ideas and insights.",
    "age": 19,
    "gender": "female",
    "mbti": "ENFJ",
    "country": "UK",
    "profession": "Transportation, Distribution & Logistics",
    "interested_topics": [
      "Culture & Society",
      "Business"
    ]
  }
]

Preparing User Data

When preparing user data for OASIS, consider the following regardless of platform:

  1. Diverse Personalities: Create agents with varied interests, opinions, and communication styles to simulate realistic social dynamics.

  2. Realistic Social Connections: For Twitter, the following_agentid_list should reflect plausible social networks based on shared interests or characteristics. For Reddit, users with similar interested_topics may interact more frequently.

  3. Initial Content: For Twitter, previous_tweets help establish the agent’s voice. For Reddit, the bio and persona fields serve a similar purpose.

  4. Consistent Identity: Ensure that the personality descriptors (user_char and description for Twitter; bio and persona for Reddit) align with the agent’s intended personality and behavior in the simulation.

  5. Platform-Specific Behaviors: Consider how users interact differently on Twitter versus Reddit. Twitter interactions are more brief and public, while Reddit discussions are often topic-focused and community-based.

In the next sections, we’ll explore how to use this user data to configure and run simulations in OASIS for different platforms.