In A Nutshell
Work to train models which detect harmful behaviors and help ensure user well-being and uphold Anthropic’s principles of safety, transparency, and oversight while enforcing terms of service and acceptable use policies.
Responsibilities
- Build machine learning models to detect unwanted or anomalous behaviors from users and API partners, and integrate them into our production system.
- Improve our automated detection and enforcement systems as needed.
- Analyze user reports of inappropriate accounts and build machine learning models to detect similar instances proactively.
- Surface abuse patterns to our research teams to harden models at the training stage.
Skillset
- Have 4+ years of experience in a research/ML engineering or an applied research scientist position, preferably with a focus on trust and safety.
- Have proficiency in SQL, Python, and data analysis/data mining tools.
- Have proficiency in building trust and safety AI/ML systems, such as behavioral classifiers or anomaly detection.
- Have strong communication skills and ability to explain complex technical concepts to non-technical stakeholders.
- Care about the societal impacts and long-term implications of your work.